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ABSTRACT 


Parallel  computing  is  the  wave  of  the  future.  As  the  need  for  computational  power 
increases,  one  processor  is  no  longer  sufficient  to  achieve  the  speed  necessary  to  solve 
today's  complex  problems. 

The  Air  Force  Space  Command  (AFSPACECOM)  tracks  approximately  8000 
satellites  daily;  the  model  used  by  the  AFSPACECOM,  SGP4  (Simplified  General 
Perturbation  Model  Four),  has  been  the  operational  model  since  1976.  This  thesis 
contains  a  detailed  discussion  of  the  mathematical  theory  of  the  SGP4  model. 

The  tracking  of  a  satellite  requires  extensive  calculations.  The  satellite  can  be 
tracked  more  efficiently  with  parallel  processing  techniques.  The  principles  developed 
are  applicable  to  a  Naval  ship  tracking  mulitple  incoming  threats;  the  increase  in  the 
speed  of  processing  incoming  data  would  result  in  personnel  being  informed  faster  and 
thus  allow  more  time  for  better  decisions  during  combat. 

Three  parallel  algorithms  applied  to  SGP4  for  implementation  on  a  Parallel 
Virtual  Machine  (PVM)  are  developed.  PVM  is  a  small  software  package  that  allows  a 
network  of  computer  workstations  to  appear  as  a  single  large  distributed-memory  parallel 
computer.  This  thesis  contains  a  description  of  several  algorithms  for  the  implementation 
on  PVM  to  track  satellites,  the  optimal  number  of  workstations,  and  methods  of 
distributing  data. 
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I.  INTRODUCTION 


The  goal  of  this  thesis  is  to  illustrate  how  a  network  of  IPX  Sunstations  can  be  used  as  a 
parallel  computer  to  solve  a  complex  military  requirement  of  tracking  8000  earth  satellites  daily. 
Parallel  processing  has  already  been  used  in  Global  Climate  Modeling,  Superconductivity, 
Seismic  Imaging,  and  many  other  important  applications  in  science  today.  Additionally,  there  are 
other  important  military  applications  where  the  use  of  parallel  computing  would  be  extremely 
advantageous.  For  example,  today's  Weapon  Control  Systems  like  AEGIS  has  enormous 
computational  requirements  to  detect  and  destroy  incoming  threats.  The  use  of  separate 
computers  located  at  individual  enclaves  versus  a  centrally  located  computer  will  reduce  the 
vulnerability  of  a  ship  should  it  take  a  direct  hit  in  the  computer  station.  The  necessary 
computing  power  will  be  continued  by  choosing  unaffected  stations;  additionally,  the  increase  in 
speed  of  processing  incoming  data  would  result  in  faster  informed  personnel  and  thus  allow 
more  time  for  better  decisions  during  combat. 

Parallel  computing  is  the  wave  of  the  future.  As  the  need  for  computational  power 
increases  daily,  due  to  an  increase  in  technological  developments,  one  processor  is  no  longer 
sufficient  to  achieve  the  speed  in  computations  necessary  to  solve  today's  problems. 

Two  ways  <me  can  achieve  greater  computational  efficiency  with  parallel  processing  are 

1.  Purchase  a  computer  developed  solely  for  parallel  processing  applications 

or 

2.  Use  existing  workstations  found  in  most  companies  today. 

The  first  option  requires  the  purchase  of  a  computer  like  the  INTEL  iPSC/2  Hypercube 
multicomputer. 


The  INTEL  iPSC/2  Hypercube  at  Naval  Postgraduate  school  was  purchased  in  1987  for  about 
$100,000.00;  the  Hypeicubc  requires  an  additional  $6000.00  per  year  to  maintain,  it  is  used 
solely  for  research  projects. 

The  second  option,  the  use  of  existing  workstations,  requires  only  that  (me  be  willing  to 
utilize  the  power  of  idle  workstation's  CPU  to  achieve  computational  efficiency  by  dividing  a 
complex  problem  into  smaller  more  manageable  data  components. 

The  average  computer  user  in  the  workplace  today  does  not  require  100  %  of  the  CPU's 
power  each  hour  of  the  day;  additionally,  at  night  the  workstations  remain  idle  until  one  logs  in 
the  next  morning  or  after  the  weekend. 

The  utilization  of  thousands  of  existing  processors  to  solve  problems  with  enormous 
computational  requirements  will  be  common  practice  in  the  future.  The  price/performance 
advantage  of  this  practice  has  not  yet  been  fully  realized;  however,  tomorrow’s  scientist  will 
wonder  how  we  achieved  the  advances  in  science  and  technology  today  with  the  use  of  serial 
processing  alone. 

Once  one  realizes  that  there  is  a  storehouse  of  computer  power  ready  to  be  distributed 
freely,  the  next  step  is  to  learn  how  to  utilize  this  power.  This  thesis  will  illustrate  how  a  network 
of  workstations  can  be  used  to  increase  the  speed  at  which  satellites  are  tracked.  This  work  will 
become  increasingly  more  important  as  the  number  of  objects  tracked  daily  steadily  increases 
and  the  number  of  calculations  required  skyrockets. 

This  is  a  continuation  of  the  Parallel  Processing  Orbital  Prediction  work  conducted  at 
Naval  Postgraduate  School  in  the  Mathematics  Department  orchestrated  by  Professors  D  A. 
Danielson  and  B.  Neta.  In  June  1992,  Warren  E.  Phipps,  Jr.  developed  several  parallel 
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algorithms  for  the  Naval  Space  Surveillance  Center's  analytic  satellite  motion  model.  The  model 
is  implemented  in  die  FORTRAN  subroutine  PPT2.  The  algorithms  woe  implemented  on  the 
INTEL  iPSC/2  Hypercube  (Phipps,  1992).  In  March  1993,  Sara  Ostrom  studied  the  parallel 
computing  potential  of  the  Air  Force  Space  Command  analytic  satellite  motion  model 
implemented  on  the  INTEL  iPSC/2  Hypercube  (Ostrom,  1993).  Currently,  Leon  Stone  is 
implementing  parallel  algorithms  for  the  Navy's  Satellite  model  using  Parallel  Virtual  Machines. 
This  body  of  work  is  the  result  of  the  implementation  of  die  Air  Force  Space  Command's 
analytic  satellite  model,  SGP4,  using  Parallel  Virtual  Machines. 

Chapter  II  discusses  die  advantage  of  the  Parallel  Virtual  Machine  (PVM)  in  terms  of 
cost,  availability  and  fault  tolerance  factors.  The  history  and  components  of  PVM  are  discussed 
followed  by  a  brief  overview  of  a  new  extension  to  PVM  called  HcNCE.  The  chapter  concludes 
with  a  short  discussion  of  other  parallel  software  packages  available  like  Express,  P4,  and  Linda. 
Chapter  HI  describes  the  Air  Force  Space  Command’s  analytical  models  SGP  and  SGP4  and 
describes,  in  detail,  the  theory  behind  the  prediction  of  a  satellite's  position  and  velocity. 
Chapter  IV  describes  three  algorithms  developed  to  study  the  parallelization  of  the  satellite 
computer  code;  additionally,  a  comparison  of  the  each  algorithm's  performance  is  analyzed  in 
detail.  The  kst  chapter,  Chapter  V,  contains  conclusions  and  suggestions  for  further  research. 
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H.  PARALLEL  VIRTUAL  MACHINE 


In  this  chapter,  the  advantages  of  using  a  Parallel  Virtual  Machine  (PVM)  in 
terms  of  cost,  availability,  and  fault  tolerance  factors  will  be  discussed.  The  history  and 
components  of  PVM  will  be  covered  followed  by  a  brief  overview  of  a  new  extension  to 
PVM  called  the  Heterogeneous  Network  Computing  Environment  (HeNCE).  Finally, 
other  software  packages  like  Express,  P4,  and  Linda  will  be  briefly  described  This  is  a 
synthesis  of  papers  written  about  the  Parallel  Virtual  Machine  (see  Dongarra,  Geist, 
Mancheck,  and  Sunderman  ,1993). 

Parallel  Virtual  Machine  is  a  small  software  package  (~ Mbyte  of  C  source  code) 
dud  allows  a  heterogeneous  network  of  Unix-based  computers  to  appear  as  a  single  large 
distributed-memory  parallel  computer.  The  PVM  package  is  good  for  large-grain 
parallelism;  that  is,  at  least  100K  bytes/node.  The  term  virtual  machine  is  used  to 
designate  a  logical  distributed-memory  computer  and  host  is  used  to  designate  one  of  the 
member  computers. 

The  PVM  software  supplies  the  functions  to  automatically  start  up  tasks  on  the 
virtual  machine  and  allows  the  tasks  to  communicate  and  synchronize  with  each  other. 
Note,  atask  is  a  unit  of  computation  in  PVM  and  is  analogous  to  a  UNIX  process. 

A  problem  can  be  solved  in  parallel  by  sending  and  receiving  messages  to 
accomplish  multiple  tasks.  These  message-passing  constructs  arc  common  to  most 
distributed-memory  computers.  By  sending  and  receiving  messages,  multiple  tasks  of  an 
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application  can  cooperate  to  solve  a  problem  in  parallel.  The  applications  can  be  written 
in  Fortran  77  or  C. 

PVM  handles  all  message  conversion  that  may  be  required  if  two  computers  use 
different  data  representations.  PVM  also  includes  many  control  and  debugging  features  in 
its  user-friendly  interface.  For  instance,  PVM  ensures  that  error  messages  generated  on  a 
remote  computer  are  displayed  on  the  user's  local  screen. 

PVM  allows  these  application  tasks  to  choose  the  architecture  best  suited  to  the 
solution.  PVM  also  supports  heterogeneity  at  the  machine  and  network  levels. 

At  the  machine  level,  computers  with  different  data  formats  are  supported  as  well  as 
different  serial,  vector,  and  parallel  architectures.  At  the  network  level,  different  network 
types  can  make  up  a  Parallel  Virtual  Machine,  for  example,  Ethernet,  Fiber  Distributed 
Data  Interface  (FDDI),  token  ring,  etc. 

Users  of  PVM  can  also  configure  their  own  parallel  virtual  machine,  which  can 
overlap  with  other  users'  virtual  machines.  Configuring  a  personal  parallel  virtual 
machine  involves  simply  listing  the  names  of  the  machines  in  a  file  that  is  read  when 
PVM  is  started. 

A.  ADVANTAGES  OF  PVM 

The  first  advantage  of  using  PVM  is  a  reduction  in  cost;  it  is  and  will  continue  to 
be  costly  to  allocate  large  computing  resources  to  each  and  every  user.  The  beauty  of 
using  workstations  for  parallel  processing  is  that  a  user  of  a  workstation  may  not  use  the 
machine  all  the  time,  but  may  need  more  than  what  a  single  workstation  can  provide 
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when  applications  are  to  be  run.  Many  scientists  are  discovering  that  their  computational 
requirements  are  best  served  not  by  a  single,  monolithic  machine  but  by  a  variety  of 
distributed  computing  resources,  linked  by  high-speed  networks. 

The  second  advantage  in  network-based  concurrent  computing  is  the  ready 
availability  of  development  and  debugging  tools.  Typically,  systems  that  operate  on 
loosely  coupled  networks  permit  the  direct  use  of  editors,  compilers,  and  debuggers  that 
are  available  on  individual  machines;  also,  users  are  already  familiar  with  the  use  and 
individual  idiosyncrasies  of  each  tool  so  that  learning  new  skills  is  not  necessary. 

The  third  advantage  is  the  potential  fault  tolerance  of  the  network(s)  and  the 
processing  elements.  Most  multiprocessors  do  not  support  such  a  facility;  hardware  or 
software  failures  in  one  of  the  processing  elements  often  lead  to  a  complete  crash. 
Additionally,  it  is  the  opinion  of  the  author,  that  for  Naval  applications  using  different 
workstations  in  different  areas  of  a  Naval  ship  can  reduce  vulnerability  should  the  ship 
take  a  direct  hit  in  a  critical  area.  The  computing  power  needed  for  a  combat  system  like 
Aegis  could  be  continued  by  choosing  unaffected  stations. 

A  study  conducted  by  Eichelberger  and  Provenchcr  (1993)  explored  using  PVM 
to  model  a  survivable  AEGIS  combat  system  for  a  CG47  Ticonderoga  class  AEGIS 
cruiser  model.  Present  naval  combat  systems  possess  only  manual  reconfiguration  and 
static  rudimentary  automatic  reconfiguration  schemes.  The  study  concluded  that  there  is 
a  significant  improvement  in  mission  readiness  when  using  a  reconfigurable  computer 
architecture. 
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B.  HISTORY  OF  PVM 


In  the  summer  of  1989,  at  Oak  Ridge  National  Laboratory  (ORNL),  the 
development  of  PVM  software  began  and  is  now  distributed  freely  in  the  interest  of  the 
advancement  of  science  around  the  world.  The  driving  force  behind  the  initial 
popularity  of  PVM  was  the  ability  to  get  an  excellent  price  performance  ratio-  better  than 
any  other  computer  system  in  the  world.  In  general,  a  cluster  of  about  10  high 
performance  workstations  is  potentially  capable  of  solving  a  problem  as  fast  as  a 
supercomputer  costing  20  times  more;  thus,  PVM  is  rapidly  becoming  a  de  facto  standard 
for  distributed  computing  How  did  all  this  begin?  The  following  is  a  brief  history  of 
PVM*s  creation  and  it's  creators: 


Summer  1989: 


Summer  1990: 


November  1990; 


Vaidy  S underam  designed  and  implemented  the  first  version  of 
Parallel  Virtual  Machine  while  visiting  Oak  Ridge  National 
Laboratory. 

Vaidy  Sunderam  and  A1  Geist  refined  the  PVM  software  to 
develop  a  Fortran  interface  and  several  parallel  applications: 
additionally,  a  graphical  interface  called  XPVM  was  developed. 

A1  Geist  developed  a  PVM  version  of  large  material  science 
application  code  run  on  a  network  of  IBM  RS/6000's  which  won 
the  1990  Gordon  Bell  Prize  for  best  price/performance  ratio  of  any 
application  in  the  world. 
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December  1990:  Sunderam  and  Geist  altered  their  PVM  research  into  the  1990 

IBM  Supercomputer  competition  and  won  first  prize. 

March  1991:  PVM  2.0  was  developed  by  Bob  Mancheck  from  PVM  1.0  -  the 

earlier  research  version.  PVM  2.0  was  made  publicly  available 
through  netlib@oml.gov. 

Summer  1991 :  Sunderam,  Geist,  and  Maachek  began  working  on  the  design 

features  of  PVM  3.0  such  as  dynamic  configuration  and  new 
routine  names.  Additionally,  a  digest  for  users  to  exchange 
information  was  set  up  at  pvmlist@mathcs.emory.edu. 

December  1991 :  Beguelin  began  the  development  of  a  new  software  package  called 

Xab,  a  monitor  and  debugger  for  PVM  programs.  This  version  can 
be  obtained  by  contacting  adam@cs.cmu.edu. 

February  1992:  PVM  2.4  was  released  and  HeNCE  was  made  available  through 

netlib@oml.gov. 

Summer  1992:  Geist  and  his  student  developed  a  package  built  on  top  of  PVM  2.4 

that  dynamically  load  balances  a  users  application. 

February  1993  :  PVM  3.0  released. 

April  1993:  PVM  3.1  released. 

August  1993:  PVM  3.2  is  released. To  receive  this  software  send  email  to 

netlib@oml.gov  with  the  message:  send  index  from  pvm3 
or  ftp  from  netlib2@cs.utk.edu  directory  pvm3. 
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C.  COMPONENTS  OF  PVM 


The  PVM  system  is  actually  composed  of  two  parts ,  the  daemon  and  a  library  of 
PVM  interface  routines. 

The  daemon  is  called  pvmd3  (sometimes  abbreviated  pvmd)  and  resides  on  all  the 
computers  making  up  the  virtual  machine.  Any  user  with  a  valid  login  can  install  this 
daemon  on  a  machine.  When  the  user  desires  to  run  a  PVM  application,  he/she  executes 
pvmd3  on  one  of  the  computers  which  in  turn  starts  up  pvmdS  on  each  of  the  computers 
mating  up  the  user-dc fined  virtual  machine.  A  PVM  application  can  then  be  started 
from  a  Unix  prompt  on  any  of  these  computers. 

The  library  of  PVM  interface  routines  contains  routines  for  passing  messages, 
spawning  processes,  coordinating  tasks,  and  modifying  the  virtual  machine.  The  user  can 
call  any  of  these  routines  and  application  programs  must  be  linked  with  this  library  to  use 
PVM. 

D.  APPLICATIONS 

A  variety  of  applications  have  been  developed  over  the  past  few  years  using 
PVM.  Below  is  a  partial  list  of  some  of  these  applications: 


Material  Science 
Global  Climate  Modeling 
Atmospheric,  oceanic,  and  space  studies 
Meteorological  forecasting 
3-D  ground  water  modeling 
Superconductivity,  molecular  dynamics 
Monte  Carlo  CFD  application 
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*  2-D  and  3-D  seismic  imaging 

*  3-D  underground  flow  fields 

*  Particle  simulation 

*  Distributed  AVS  flow  visualization 

As  a  result  of  this  thesis ,  one  can  add  Orbital  Prediction  to  this  list. 

Application  programs  are  composed  of  subtasks  (or  components)  at  a  moderate 
level  of  granularity.  The  programs  view  the  PVM  system  as  a  general  and  flexible 
parallel  computing  resource  which  may  be  accessed  at  three  different  modes: 

1.  Transparent  -  subtasks  are  automatically  located  at  the  most 

appropriate  sites. 

2.  Architecture-dependent  -  subtasks  specific  for  architecture  execution  are 

chosen  by  the  user. 

3.  Machine-specific  -  subtasks  are  located  on  a  particular  machine  to 

exploit  particular  strengths  of  individual  machines 

During  execution,  multiple  instances  of  each  component  or  subtask  may  be 
initiated.  Figure  2.1  on  the  next  page  illustrates  a  simplified  architectural  overview 
of  the  PVM  system  (see  Geist  and  Sunderman ,  page  3, 1993) . 
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Figure  2.1  Simplified  Architectural  Overview  of  PVM 


Application  programs  under  PVM  may  possess  arbitrary  control  and  dependency 
structures;  that  is,  at  any  point  in  the  execution  of  a  concurrent  application,  the  processes 
in  existence  may  have  arbitrary  relationships  between  each  other  and  any  process  may 
communicate  and/or  synchronize  with  any  other.  Any  specific  control  and  dependency 
structure  may  be  implemented  under  the  PVM  system  by  appropriate  use  of  PVM 
constructs  and  host  language  control  flow  statements. 

Multiprocessing  on  loosely  coupled  networks  provides  facilities  that  are  normally 
not  available  on  tightly  coupled  multiprocessors.  For  example,  debugging  support,  fault 
tolerance,  and  profiling  and  monitoring  to  find  hot-spots  or  load  imbalances  within  an 
application. 
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The  disadvantages  associated  with  networked  concurrent  computing  are 
generating  and  maintaining  multiple  object  modules  for  different  architectures, 
considerations  of  security  into  personal  workstations,  and  other  administrative  functions. 
PVM  supports  two  auxiliary  components  that  provide  some  features  to  overcome  these 
disadvantages.  First,  the  HeNCE  interface  is  a  graphical  based  parallel  programming 
paradigm.  Second,  PVM  is  undergoing  extensions  to  make  PVM  work  on  MPP 
machines  which  it  now  does  on  several  made  by  Intel,  TMC,  Cray,  and  Convex  with 
KSR  and  Sequent  underway  (  Geist,  1993). 

E.  HETEROGENEOUS  NETWORK  COMPUTING  ENVIRONMENT  (HeNCE) 
HeNCE  simplifies  the  writing  of  parallel  programs  and  was  developed  with  two 
goals  in  mind : 

1 .  Make  network  computing  accessible  without  the  need  for  extensive  training  in 
parallel  computing 

and 

2.  Make  the  resources  best  suited  for  a  particular  phase  of  the  computation  available 
to  the  users. 

In  HeNCE  the  programmer  explicitly  specifies  parallelism  of  a  computation  by 
drawing  graphs.  The  nodes  in  a  graph  represent  user  defined  subroutines  (written  in 
either  FORTRAN  or  C)  and  the  edges  indicate  parallelism  and  control  flow.  HeNCE  will 
automatically  execute  the  subroutines  in  parallel  (whenever  possible)  across  a  network  of 
heterogeneous  machines.  HeNCE  relies  on  the  PVM  system  for  process  initialization 
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and  communication.  If  one  wishes  to  write  explicit  message  passing  parallel  programs 
a  network  of  machines  they  should  use  the  PVM  system  directly. 

Once  the  graph  is  complete,  HeNCE  will  automatically  write  the  parallel  program 
including  all  the  communication  and  synchronization  routines  using  PVM  calls.  HeNCE 
tools  exist  to  assist  the  user  in  compiling  this  program  for  a  heterogeneous  environment. 

HeNCE  is  composed  of  five  integrated  graphical  tools.  Below  is  a  brief 
explanation  of  each  tool: 

1.  Compose  -  use  to  specify  the  parallelism  of  an  application  by  drawing  a 

graph  illustrating  dependencies  between  procedures 

2.  Configure  -  use  to  specify  a  network  of  heterogeneous  computers  to  be 

used  as  the  PVM  and  defines  a  cost  matrix  between  machines 

and  procedures 

3.  Build  -  use  to  compile  and  install  the  procedures  written  by  the 

compose  tool 

4.  Execute  -  use  to  dynamically  map  procedures  to  machines  for  execution 

of  the  application  and  collect  tracing  information 

5.  Trace  -  use  to  read  the  trace  information  and  display  an  animation  of 

the  execution,  either  in  real  time  for  debugging  or  later  for 

performance  analysis. 
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An  initial  version  ofHcNCE  is  available  through  the  netHb.lo  obtain  HeNCE 
send  email  to  netlib@ornl.gov  and  next  to  subject  one  should  type:  send  index  from 
hence;  any  problems  with  HeNCE  can  be  addressed  to:  henee@msr.epm.ornLgov. 

F.  OTHER  SOFTWARE  PACKAGES 

Various  other  software  packages  have  been  developed  that  enable  scientists  to 
write  heterogeneous  programs;  these,  as  well  as  PVM,  have  evolved  over  the  last  several 
years,  but  none  of  them  can  be  considered  fully  mature.  It  is  an  exciting  time  in 
parallel  computing  and  there  are  many  grand  challenges  for  scientists  to  explore. 

I  would  like  to  briefly  discuss  some  of  the  other  software  packages,  in  order  that 
the  reader  will  be  familiar  with  their  names  and  features  (see  Dongarra,  1993). 

Examples  of  such  other  software  packages  include  Express,  P4,  and  Linda;  however,  it 
is  important  to  note  that  these  packages  are  by  no  means  the  only  ones  in  existence.  Each 
package  is  layered  over  the  native  operating  systems,  exploits  distributed  concurrent 
processing,  and  is  flexible  and  general-purpose;  all  exhibit  comparable  performance. 
Their  differences  lie  in  their  programming  model,  their  implementation  schemes,  and 
their  efficiency. 

Express  toolkit  is  a  collection  of  tools  that  individually  address  various  aspects  of 
concurrent  computation.  The  toolkit  is  developed  and  marketed  commercially  by 
ParaSoft  Corporation,  a  company  started  by  some  members  of  the  Caltech  concurrent 
computation  project  Express  is  based  on  beginning  with  a  sequential  version  of  an 
application  and  following  a  recommended  development  life  cycle  culminating  in  a 
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parallel  version  that  is  tuned  for  optimality.  The  core  of  the  Express  system  is  a  set  of 
libraries  for  communication,  10,  and  parallel  graphics. 

P4  is  a  library  of  macros  and  subroutines  developed  at  Argonne  National 
Laboratory  for  programming  a  variety  of  parallel  machines.  P4  supports  both  the 
shared-memory  model  and  the  distributed-memory  model.  In  the  process  management 
mechanism  in  P4  there  is  a  "master"  process  and  "slave"  processes,  and  multilevel 
hierarchies  may  be  formed  to  implement  what  is  termed  a  cluster  model  of  computation. 
Shared  Memory  support  via  monitors  is  a  distinguishing  feature  of  P4;  however,  this 
feature  is  not  distributed  shared  memory,  but  is  a  portable  mechanism  for  shared  address 
space  programming  in  true  shared  memory  multiprocessors.  A  set  of  macro  extensions 
was  developed  at  GMD  (Gesellschaft  fur  Mathematik  und  Datenveraibeitung  in  Schloss 
Birtinghoven,  Gemany)  called  Parmacs.  Parmacs  provided  Fortran  interfaces  and  a 
variety  of  high-level  abstractions  dealing  with  global  operations  to  the  P4  system. 

Linda  is  a  concurrent  programming  model  that  has  evolved  from  a  Yale 
University  research  project.  The  primary  concept  in  Linda  is  that  of  a  "tuple-space",  an 
abstraction  via  which  cooperating  processes  communicate.  The  tuple-space  concept  is 
essentially  an  abstraction  of  distributed  shared  memory,  with  one  important  difference 
(tuple-spaces  are  associative),  and  several  minor  distinctions  (destructive  and 
non-destructive  reads,  and  different  coherency  semantics  are  possible).  Applications  use 
the  Linda  model  by  embedding  constructs  that  manipulate  the  tuple  space.  Recently,  a 
new  system  technique  has  been  proposed,  at  least  nominally  related  to  the  Linda  project. 
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This  scheme,  termed  "Piihana"  proposes  a  proactive  approach  to  concurrent  computing 
where  resources  seize  tasks  from  a  well  known  location  based  on  availability  and 
suitability. 
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III.  SGP  AND  SGP4 


A.  SIMPLIFIED  GENERAL  PERTURBATION  MODEUSGP) 

The  original  model  used  by  the  Air  F orcc  Space  Command  to  track  satellites  was 
the  Simplified  General  Perturbation  model  (SGP).  The  model  was  simplified  by  the 
exclusion  of  perturbation  effects  caused  by  higher  order  terms  in  the  Legendre  expansion 
of  the  Earth's  gravitational  potential  or  other  celestial  bodies  like  the  moon  or  the  sun. 
The  model  also  assumed  the  drag  effect  on  mean  motion  as  linear  in  time;  this 
assumption  dictated  a  quadratic  variation  of  mean  anomaly  with  time.  The  drag  effect  on 
eccentricity  was  modeled  such  that  the  perigee  height  remained  constant  (Hoots  and 
Roehrich  (1980),  page  2). 

These  simplifications  allowed  an  analytic  solution  to  the  equations  of  motion. 
Although  the  solutions  are  not  as  accurate  as  numerical  techniques,  they  are 
computationally  less  expensive.  Semi-analytic  models  increase  the  accuracy  while 
decreasing  the  computational  cost.  See  Dyar  (1993)  for  comparison  of  various  models  in 
terms  of  accuracy  and  computer  time  required  on  a  Sun  Sparc  10. 

Hilton  and  Kuhlman  (1966)  developed  the  analytical  SGP  model.  SGP's 
gravitational  submodel  is  a  simplification  of  the  work  done  by  Kozai  (1959)  and  Brouwer 
(1959).  For  a  more  detailed  discussion  of  the  SGP  model  see  Hoots  and  Roehrich  (1980) 
and  Sara  Ostrom  (1993),  pp.  10-20. 
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B.  SIMPLIFIED  GENERAL  PERTURBATION  MODEL  FOUR  (SGP4) 

1.  Overview 

The  second  model,  SGP4,  was  obtained  by  a  simplification  of  a  more  extensive 
analytical  theory  developed  by  Lane  and  Cranford  (1969)  which  uses  the  solution  of 
Brouwer  (1959)  for  its  gravitational  model  and  a  power  density  function  for  its 
atmospheric  model  [Hoots  and  Roehrich  (1980),  p.2].  SGP4  had  replaced  SGP  as  the 
operational  theory  at  the  AFSPACECOM  by  1976. 

The  SQP4  extension  to  SGP4  was  developed  to  be  valid  for  deep-space  satellites. 
The  deep-space  equations  were  developed  by  Hujsak  (1979).  SDP4  models  the  effects  of 
the  moon  and  sun  in  addition  to  certain  sectoral  and  tesseral  Garth  harmonics  that 
become  important  for  half-day  and  one-day  period  orbits. 

The  SGP4  and  it's  extension,  SDP4,  are  both  analytical  models.  They  identify 
variations  in  terms  of  changes  in  the  osculating  elements  with  respect  to  time.  The 
models  are  more  accurate  than  the  original  SGP  model  due  to  two  factors: 

1.  The  inclusion  of  zonal  harmonics  through  J4 ;  whereas,  the  SGP  model 
only  included  zonal  harmonics  through  J,. 

2.  The  inclusion  of  a  drag  force  in  the  equations  of  motion  versus  the  linear 
simplification  of  the  SGP  model. 
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The  main  program,  DRIVER  reads  the  input  and  calls  either  SGP4  or  SDP4.  If 
the  satellite  is  "near-earth”  (e.g.,  orbital  period  less  than  225  minutes)  then  SGP4  is 
called;  otherwise,  the  satellite  is  classified  "deep-space"  and  DRIVER  calls  SDP4. 

SGP4  and  SDP4  receive  input  from  the  DRIVER  and  perform  calculations 
necessary  to  return  to  the  DRIVER  the  position  and  velocity  vector  in  units  of  earth  radii 
and  minutes.  The  DRIVER  performs  a  unit  conversion  to  kilometers  and  seconds  for 
printout. 

SGP4  and  SDP4  both  call  two  functions,  ACT  AN  and  FMOD2P.  ACT  AN  is 

passed  the  values  of  sine  and  cosine  and  returns  the  angle  in  radians  in  the  range  of 

0  to  2 n.  FMOD2P  is  passed  an  angle  in  radians  and  returns  the  modulo  by  2n  of  that 
angle. 

Additionally,  SDP4  calls  the  subroutine  DEEP.  The  first  time  DEEP  is  called 
certain  constants  already  calculated  in  SDP4  are  passed  through  an  entry  called  DPINT. 
All  initialized  quantities  needed  for  deep-space  prediction  are  calculated  At  this  time,  it 
is  also  determined  whether  the  orbit  is  sychronous  or  if  the  orbit  experiences  resonance 
effects.  During  initialization,  the  subroutine  DEEP  calls  the  function  THETAG.  The 
function  THETAG  obtains  the  location  of  Greenwich  at  epoch  and  converts  epoch  to 
minutes  since  1950. 

The  next  time  SDP4  calls  DEEP  occurs  during  the  secular  uodate  portion  and  is 
via  the  entry  DPSEC.  The  secular  update  portion  of  SDP4  is  where  additional  secular 
and  long-period  resonance  effects  are  added  to  the  values  of  the  "mean"  orbital  elements. 
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The  final  access  to  DEEP  occurs  via  DPPER  where  the  appropriate  deep-space 


lunar  and  solar  periodica  are  added  to  the  oibital  elements. 

2.  la  pat  Paraaieten 

The  SGP4  model  uses  the  six  orbital  elements,  a  drag  factor,  and  an  epoch 
reference  time  to  predict  the  satellite  position  and  velocity  vectors  at  a  future  time. 

The  six  orbital  elements  are  "mean"  values  obtained  by  removing  periodic 
variations  in  a  particular  way.  The  elements  are  given  below  along  with  die  name 
assigned  to  each  indie  SGP4  Fortran  computer  code: 


VARIABLE  NAME 

SYMBOL  IN  THEORY 

COMPUTER  CODE 

riean  Motion  at  Epoch 

nQ 

XNO 

kcentricity 

e0 

EO 

Inclination  of  Orbital  Plane 
to  the  Equator 

to 

XINCO 

light  Ascension  of  the 
Ascending  Node 

fio 

XNODEO 

Argument  of  Perigee 

<D« 

OMEGAO 

Mean  Anomoly  at  Epoch 

Mo 

XMAO 

Table  3-1  Classical  Orbital  Elements 


The  following  diagram  will  be  useful  throughout  this  discussion  in  visualizing 
die  satellites  orbit  and  the  angles  given  in  table  3-1  above: 
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3.  PROGRAM  SEQUENCE  FLOW 

The  tea  main  steps  to  solve  for  position  and  velocity  vectors  are  as  follows: 

1)  Recover  original  mean  motion  and  semimajor  axis  from  the  input  elements. 

2)  If  necessary,  update  the  parameter  for  the  SGP4  density  function. 

3)  Calculate  constants  using  appropriate  values  of  the  density  function  from 
step  two  above. 

4)  Account  for  the  secular  effects  of  atmospheric  drag  and  gravitation. 

5)  Add  the  long  periodic  terms. 

6)  Solve  Kepler's  equation. 

7)  Calculate  the  preliminary  quantities  needed  for  short  periodics. 

8)  Update  the  osculating  quantities  using  the  short  periodics. 

9)  Calculate  the  unit  orientation  vectors. 

10)  Calculate  the  postion  and  velocity  vectors. 

The  SDP4  model  follows  these  same  steps  with  the  addition  of  several  calls  to  the 
subroutine  DEEP  which  was  discussed  earlier. 

C.  EQUATIONS 

This  section  will  describe  the  equations  developed  by  Hoots  and  Roehrich  (1980), 
pp.  14-37 .  The  ten  main  steps  listed  above  will  save  as  the  outline  of  the  discussion. 

A  strict  parallel  structure  exists  between  the  computer  code  and  the  equations. 
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1.  Recover  Original  Mean  Motion  and  Semimajor  Axis 
The  input  variable  for  mean  motion  (nj  requires  modification  after  which  it  is 
denoted  by  nj.  This  modification  to  ne  is  accomplished  as  follows: 


1)  n« "  - -----  relationship  of  n"  to  n0 

1  +Op 

where 

_  e  3*2(3  cos2 i,-l) 

Op  —  _  _  _  „  ' 

2ai(l-el) ™ 


b.  *2  = 


J2~  the  second  gravitational  zonal  harmonic  of  the  earth 
o\  —  the  equational  radius  of  the  earth  squared 


c.  a0 


a\ 


i34»n 

81  J 


j  ,  3*2(cos2f0  - 1) 

Ol  = - f - -  ■••• 

2 «;<>-«!)" 

e'  a>  =  Gw  wbere  k,  =  JgM  ,G  is  Newton's  universal  gravitational 
constant  and  A/  is  the  mass  of  the  Earth. 

2)  To  recover  the  semimajor  axis  use  a"  =  a°  where  5„  is  the  same  as  above. 

l  +  5( 

2.  Update  The  Parameter  for  the  SGP4  Density  Function 
Two  parameters,  s  and  q0  ,  for  the  SGP4  density  funcion  may  require 
adjustments.  The  scale  height  parameter  constant  used  by  SGP4  is 
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s  =  1.01222928  earth  radii  (er);  s  changes  depending  on  the  height  of  the  satellite  at 
perigee.  For  perigees  between  98  kilometers  and  1S6  kilometers  s  is  replaced  by  s*  , 


where**  =  <tf(l -«,)-*+ a*  with  units  of  earth  radii  and  where  perigee  height  is 


calculated  by  perigee  =  [of  (1  -  e„)  -  a£]  •  Re  (kilometers)  and  /?£  is  the  spherical 


earth  radius. 


For  perigees  below  98  kilometers,  *  is  replaced  by  **  where 


s*  =  v^i +  aE  XKMPER=  6378.135  Kilometcra/Earth radii 
XKMPhR 

It  should  be  noted  that  if  *  is  changed  then  a  term  ( g0-s )4  is  also  replaced 
by  (qo-s*)4. 

From  this  point  on,  the  double-prime  notation  will  be  dropped  for  the  mean 
motion  and  the  semimajor  axis,  as  well  as  the  *  on  a.  It  will  be  understood 
that  these  corrections  have  already  been  made  when  the  symbols  n0 
and  s  are  used. 

3.  Calculate  Constants 

a.  The  following  constants  are  calculated  for  both  SGP4  and  SDP4: 


6  =  cos  i„ 


p.  =  (l-*’)W 
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rj  =  a0e0% 


C\  =  B*  •  Ci  B*  -  drag  coefficient 

C2  =  (q0  -  s)*f,*n0(l  -  r\2ym[a0{\  +  fri2  +  4e0r\  +  e0ri3H 

-3M  .  (_I  +  |e2)(8  +  2Ar\2  +  3ri4)] 

2(1  -ti2)  V  2  2 

c4  =  2*.<*0  -  *)444a,Po(i  -  nJr7/a‘ 

{[2n(l  -  e.n)  +  +  in’]  -  ,  •  [3(1  -  3«JX1  +  In2  -  2*.1  -  3*.V> 

+ 1(1  -  e2X2n2  -  e0r\  -  fioTi3)0082®®]} 

b.  The  following  constants  are  calculated  by  SGP4  only  for  perigees  above  220 
kilometers: 

C,  =  •a'-#V*J&e£*h.  where  AJfi  =  -J,4 

ki^o 

Cl  =  2 (go  -  j)4^4aop2(l  -  t|2)“7/a[l  +  ^(n  +  e0 )  +  e„il3] 

Di^Aa^Cl 
D^laotfilJao  +  syC] 

Z>4  =  §a<£3(221a0  +  31s)C} 
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4.  Secular  Effects  of  Atmospheric  Drag  and  Gravitation 
M0  ,  a>o ,  and  Cl0  are  updated  as  follows: 


a.  First,  -  ,  to  of  ,  and  Qdf  are  calulated: 


1)  Mdf  —  M0  +  M  At 


2)  a>DF  —  ©o  +  (0  A/ 


3)  Qdf  =  £io  +  Q  A/ 


where  A t~  i  -  t0  -  time  since  epoch  and 


Recall  that ,  kz  - 


J 2  =  the  second  gravitational  zonal  harmonic  of  the  Earth 


and 


Ji  =  the  fourth  gravitational  zonal  harmonic  of  the  Earth 
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Note :  this  is  the  point  in  SDP4  where  the  DEEP  initialization  for  deep-space 


calculations  is  entered  through  DPINT  discussed  earlier. 


b.  Then  Mp  ,<o ,  and  ft  are  calculated  by 


1)  Mp  =  Mdf  +  8©  +  8 M 


2)  ©  =  ©af  -  8©  -  8 M 


3)  ft  =  ft/)F  — 


21«o^eCiA/2 

2a2p2 


If  perigee  is  less  than  220  kilometers 
8©  =  &M  =  0 

otherwise, 

6©  =  B*C3(cos©0)A/ 

SM  =  ^y($<,-s)4fl*44^[(  1  +  ticosA^if)3  -  (1  +  ricosAf0)3] 

Note:  At  this  point  SDP4  calls  the  secular  portion  of  DEEP  via  DPSEC  to  add 
the  deep-space  secular  effects  and  long-period  resonance  effects  to  the 
six  orbital  elements. 

c.  Next,  o  ,  a ,  and  the  mean  longitude,  L  ,  are  updated  as  follows: 


1)0  =  0*-  B'Ci&t  -  B*Cs(smMp  -  sinAf*) 


2)a  =  a„[\  -  Ci A/  -  DjA/2  -  D3A/3  -  D4A/4]2 


3)  L  =  Mp  +  eo  +  Q  +  »o  •  (|CiA^  +  ( Z>2  +  2C?)A/3 
+t(3D3  +  12CiD2  +  lOC^A/4 

4 

+j(3D4  +  12Ci£)3  +6D*2  +  30 C\Di  +  15Ct)A/s] 

If  the  perigee  height  is  less  than  220  kilometers  then  a  and  L  equations  are 
truncated  after  the  Ci  term  and  the  equation  for  e  is  truncated  after  the  C4  term, 
d.  The  last  step  in  this  section  is  to  calculate  p  and  n  : 
i)  p= 


Note:  At  this  point  SDP4  calls  the  pcriodics  section  of  DEEP  via  DPPER  to  add 
the  deep-space  lunar  and  solar  periodics  to  the  orbital  elements. 

5.  Add  The  Long  Periodic  Terms 

The  addition  of  long-periodic  zonal  effects  are  accomplished  by  the  following: 
a.  axN  =  e  cos  go 

b.  .  ^3.osin  i0 

.  ozn  =  esmoo  +  ana.  where,  ana.  =  — - ~ 

4*2ap2 
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oxn  and  a™  are  the  horizontal  and  vertical  components,  respectively,  of  the 
eccentricity  vector  with  respect  to  the  line  of  nodes  vector.  The  following  figure 

illustrates  the  geometry  of  the  components: 


6.  Solve  Kepler's  Equation 

Solve  Kepler's  equation  by  a  method  of  successive  approximations. 

Let  U  =  Lt  -  D 

and  U  =  (E  +  ©)j  the  first  term  in  the  iteration  of  the  sum  of  tike  eccentric  anomoty  and 

die  resulting  argument  of  perigee.  Thus, 

U  =  U„  +  AU 


for  successive  iterations,  that  is 

(E  +  ©)<+!  =  (E  +  ©)i  +  A(. E  +  co), 

Let  EPW  =  E  +  ©  then 


MEPW), 


U-  amCoaiEPW),  +  axysinCEPW),  -  (£PFP) 
-awsin(£7>)*'),  -  ajnvC0s(£PF7)i  +  1 


Continue  iterations  until  \E(EPW)i\  <  1.0~*  then  set  £  +  ©  =  (£  +  ©)». 


7.  Short  Periodic  Preliminary  Calculations 

The  following  equations  are  the  preliminary  calculations,  the  results  are  added  in 
section  eight  to  obtain  the  osculating  quantities: 

a.  ecosE  =  amcos(EPW)+  anrsinCEPW) 
esinE  =  axN*in(EPW)  -  an/cos(JEPW) 
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r  =  a(  1  -  *co6 E) 


r  =  ki^y-esinE 


r 


Tempi  - - - .... 

cos u  =  ^[cos( EPW)-  axN  +  <*tn (e  sin£)  •  Tempi] 
sinu  =  £[sin( EPW)  -  am  -  axv (e  siniT)  •  Tempi] 

"  ■  "““(cosS) 


Ar  =  ~-(l  -  0a)cos2 u 
. Pl 

Am  =  -  -^-(70 2  -  l)ein2u 
4Pz. 


Aft  = 


3*20 

2p! 


sm2u 


n.  Ai  =  ^~sini<,cos2u 

2 pI 

o.  A  r  =  -  p~(l  -  02)sin2u 

p.  Ar>  =  *jf[(l  -  »J)cw2*  -  |(1  -  36!)] 

8.  Update  The  Osculating  Quantities 

Now,  the  short  periodic  preliminary  results  are  added  to  obtain  the  osculating 
quantities: 

3 
2 

b.  uk  =  u  +  Au 

c.  Q*  =  Q  +  Afi 

d.  i*  =  i  +  Ai 

e.  r*  =  r  +  A  r 

f.  rfK  =  rf  +  At/ 
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9.  Cakilate  Unit  Orientation  Vectors 

The  osculating  angles  found  above  are  utilised  to  find  the  unit  orientation  vectors 
as  follows: 

__  Mx  -sinfircos/r 
M  =  A iT  -  ooeClgCOBix 
_MZ  J  L  sinijr 

Nx  cosQjc 
N  =  Nr  =  sinQ* 

Nz  J  [  0 

then  U  =  Msinux  +  N  cos  ux 

and  V  =  Mcosux  -  Nsinux 

10.  Calculate  The  Postion  And  Velocity  Vectors 

Finally,  the  position  and  velocity  vectors  are  calculated  as  follows: 

r  =  rKU 

r  =  rKU  +(r/)^P 

This  results  in  the  position  and  velocity  in  units  of  earth  radii  and  minutes.  The 
postion  and  velocity  vectors  are  then  passed  to  the  DRIVER  at  which  time  the  unit 
conversion  to  kilometers  and  seconds  is  accomplished. 
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IV.  PARALLELIZATION  OF  SGP4  USING  PVM 


A.  OVERVIEW 

The  goals  of  this  chapter  are  two-fold: 

1.  Explain  how  die  Air  Force  Space  Command's  satellite  code  was  parallelized 
using  die  Parallel  Virtual  Machine  and 

2.  Compare  various  algorithms  in  terms  of  total  time,  communication  overhead, 

speedup,  and  efficiency. 

a.  Speedup  (Sp )  is  calculated  as  follows: 


where 

T\  =  Endtoend  Time  on  a  Single  Processor 
Tp  =  Endtoend  Time  on  p  Processors 

Note:  Endtoend  Time  will  be  the  term  used  to  denote  the  total  time  to 
execute  the  program  not  including  the  time  to  read  the  input  file. 

b.  Efficiency  is  calculated  by: 


E  = 


sp 

P 


where 

Sp  =  Speedup  for  p  processors 
p  =  Number  of  processors 
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Three  algorithms  were  developed  to  study  the  performance  of  tike  parallelization 
of  the  satellite  code.  The  algorithms  were  based  upon  previous  work  completed  by  Ford 
and  Carvahlo  (1993). 

Data  was  collected  for  each  algorithm;  each  execution  time  is  the  result  of  an 
average  of  ten  recorded  run  times. 

Analysis  was  performed  on  each  algorithm's  results  by  comparing  each  model's 
performance  and  the  use  of  four,  eight,  and  sixteen  nodes  to  execute  the  tasks. 

It  is  important  to  note  that  with  the  use  of  an  open  network  of  computers  there  is 
undoubtedly  going  to  be  fluctuating  machine  and  network  loads.  Multiple  users  and  other 
competing  PVM  tasks  cause  the  machine  and  network  loads  to  change  dynamically;  thus, 
in  order  to  have  sufficient  balancing,  great  care  was  taken  to  collect  data  at  times  where 
the  load  on  the  system  was  relatively  constant  However,  due  to  the  fluctuation  of  open 
networks,  die  reproduction  of  the  exact  data  results  would  be  impossible. 

In  addition  to  the  system  load  discussed  above,  one  needs  to  consider  Load 
Balancing.  Load  Balancing  refers  to  the  degree  to  which  all  nodes  are  working  to  solve 
the  problem  at  hand.  There  are  generally  three  types  of  Load  Balancing  according  to 
Geist  (1993): 

1.  Static  Load  Balancing 

The  problem  is  divided  into  separate  tasks  which  are  assigned  to  the 
processors  only  once.  The  number  or  size  of  each  task  can  be  varied 
to  utilize  different  computational  powers  of  machines. 
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2.  Dynamic  Load  Balancing  by  Pool  of  Tasks 

This  is  usually  used  with  a  Master  and  Slave  scheme,  the  master  continues  to 
deal  tasks  to  idle  slaves  until  the  task  queue  is  empty.  This  results  in  the  faster 
processors  receiving  more  tasks. 

3.  Dynamic  Load  Balancing  by  Coordination 

Typically  used  by  Single  Program  Multiple  Data  Stream  (SPMD)  where  each 
processor  receives  a  single  set  of  instructions,  receives  and  manipulates  data, 
and  redistributes  its  work  at  fixed  times. 

The  second  type,  Dynamic  Load  Balancing  by  a  Pool  of  Tasks,  where  a  Master 
and  Slave  scheme  exists  was  utilized  in  this  research. 

The  Master/Slave  approach  is  currently  a  popular  distributed  programming 
scheme.  The  Master  starts  all  the  Slave  tasks  and  coordinates  their  work  and 
input/output  All  three  algorithms  developed  use  a  Master/Slave  approach. 

Two  other  distributed  programming  schemes  are  the  "hostless”  Single  Program 
Multiple  Data  (SPMD)  and  the  Functional  schemes  (Geist,  1993).  The  "hostless"  SPMD 
uses  the  same  program  executed  on  different  pieces  of  the  problem;  whereas,  the 
Functional  scheme  consists  of  several  programs  each  one  performs  a  different  function  in 
the  application. 
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a  INPUT  DATA 


Approximately  8000  satellites  are  Indeed  by  the  Air  Force  Space  Command 
(AFSPACECOM)  in  Colorado  Springs  daily;  thus,  a  file  consisting  of  8000  satellite  entries 
was  created.  Note  that  the  same  near-earth  record  and  deep-space  record  was  copied  to 
generate  the  8000  input  records. 

Each  entry  or  input  record  consists  of  twenty-two  individual  numerical  values. 
Table  4-1  on  die  following  page  illustrates  a  typical  input  record  used. 

Note  that  the  input  record  used  by  AFSPACECOM  consists  of  seventeen 
individual  numerical  values  (see  Hoots  and  Roehrich,1980,  p.91) .  Table  4-2  on  page  39 
illustrates  a  typical  AFSPACECOM  record. 

There  is  a  direct  correspondence  between  the  first  17  values  of  the  input  record 
used  in  this  research  and  die  first  16  values  of  die  AFSPACECOM  record.  The 
seventeenth  entry  in  the  AFSPACECOM  record  is  die  epoch  revolutions  that  have  been 
recorded  since  die  object  was  first  launched.  Note  that  this  information  is  not  used  to 
calculate  die  position  and  velocity  vectors  of  the  satellite. 

The  entries  18  -22  in  Table  4-1  simulate  the  number  of  calls  made  other  to  SGP4 
or  SDP4  per  input  record  as  wiO  be  explained  later. 
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r 

Name 

Explanation 

Example  | 

n 

Cardno 

2  card  format 

i _ 

m 

Satellite  number 

Satellite  ID 

:::z^hhh 

m 

YR 

Year 

93 

4 

RDAY 

Day 

275.98708465 

5 

XNDOT 

Derivative  of  mean 
motion 

0.01431103 

~6 

XN2DT 

2nd  derivative  of  mean 
motion 

VTi  m  *„ 

7 

IE 

Exponent  of  XN2DT 

0  1 

8 

BTERM 

Drag  term 

0.14311  1 

9 

IE2 

Exponent  of  BTERM 

■1  1 

EE 

EPHTYP 

Ephemeris  type 

ohm 

ED 

ICRDN02 

Card  number  2 

2 

EE 

XINCO 

Inclination 

46.7916 

EE 

XNODEO 

Right  ascension 

230.4354 

ED 

EO 

Eccentricity 

0.7318036 

EE 

OMEGAO 

Argument  of  perigee 

47.4722 

ED 

XMAO 

Mean  Anomaly 

10.4117 

EE 

XNO 

Mean  motion 

2.28537848 

EE 

IYR 

Start  year 

93 

EE 

SRDAY 

Start  day 

276.98708465 

ED 

JYR 

Stop  year 

93 

El 

SPDAY 

Stop  day 

277.98708465 

an 

DELTA 

Time  step  in  minutes 

60 

Table  4-1  Example  of  Input  Record 
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Name 

Example 

1 

Cardno 

2  card  format 

1 

2 

Satellite  number 

Satellite  ID 

00603U 

3 

International 

designator 

193-022B 

1 

Epoch  time 

Year  and  day-lst  2 
digits  are  the  year  the 
others  are  the  day 

93162.71380248 

I 

f  or  B 

Mean  motion 
derivativeCrev/day3)  or 
B  (m2/lg) 

0.00073094 

c 

mean  motion  dot  dot 

Mean  motion  2nd 
derivatta/6 

0 

■ 

BSTAR 

45562-3 

B 

Ephtype 

Denotes  model :  2  is 
for  SGP4 

2 

9 

Element  No. 

864 

10 

Satellite  number 

Satellite  number  of 
card  2 

00603U 

11 

Jo 

Inclination 

89.8623 

12 

Qo 

Right  ascension 

245.9276 

13 

ea 

Eccentricity 

.0006273 

<0  o 

Argument  of  perigee 

337.4473 

15 

M0 

Mean  anomaly 

22.6464 

16 

na 

Mean  motion  (rev/ day) 

15.03410461 

17 

Epoch  rev 

Epoch  revolutions 

59663 

Table  4-2  Typical  Input  values  for  AFSPACECOM 


The  entry  number  17  in  Table  4-1  and  entry  number  16  in  Table  4-2,  die  mean 
motion  (XNO),  determines  whether  or  not  the  satellite  is  a  deep-space  object  SGP4 
propagates  data  for  near-earth  satellites  which  require  more  frequent  tracking  due  to  the 
atmospheric  drag  factor  and  SDP4  propagates  data  for  the  deep-space  satellites. 

In  order  for  an  object  to  be  classified  as  decp-space  die  period  must  be  greater 
than  225  minutes.  The  period  is  calculated  by 


For  a  period  greater  than  225  minutes  XNO  must  be  less  than  6.4  since: 


r  -  > 225  “ 


Rearrange  and  solve  for  XNO 


XNO  < 


1440  min 
225  min 


That  is, 

XNO  <  6.4—^ 
day 

Thus,  the  example  in  Table  4-1  illustrates  a  deep-space  satellite  and  the  example 
in  Table  4-2  illustrates  a  near-earth  satellite. 
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Out  of  the  8000  satellite  tracked  approximately  85  %  are  near-earth  and  15  %  are 
deep-space;  therefore,  6800  of  the  8000  input  records  (consisting  of  22  (dements  each) 
woe  near-earth  and  die  remaining  1200  records  woe  deep-space. 

The  requirement  for  more  frequent  tracking  of  near-earth  satcffifes  was  sininlated 
by  requiring  72  calls  to  the  SGP4  subroutine  per  input  record,  resulting  in  72  output 
records  generated  per  input  record.  If  die  satellite  was  deep-space  the  SDP4  subroutine 
was  called  24  times  per  input  record,  resulting  in  24  output  records  generated  per  input 
record.  72  and  24  was  choosen  to  parallel  the  work  done  by  Ostrom  (1993).  The  output 
record  consisted  of  die  time  since  the  last  propagation,  three  components  of  die  position 
vector,  and  three  components  of  the  velocity  vector  for  a  total  of  7  output  data  elements 
per  output  record. 

To  illustrate  how  this  was  accomplished,  consider  die  input  record  in  Table  4-1. 

The  difference  between  the  start  year  and  day  is  one  day  or  1440  minutes.  The  time  step  of 
60  minutes/call  (over  a  period  of  1440  minutes)  resulted  in  24  calls  to  the  SDP4 
subroutine. 

C.  ALGORITHMS 

1.  Overview 

Three  algorithms  were  considered  in  order  to  maximize  load  balancing  and 
mmitiwM  communication  overhead.  AD  three  algorithms  used  PVM  to  simulate  a  2D  torus 
topology.  A  2D  torus  is  like  a  2D  mesh  with  die  addition  of  communication  links  between 
die  nodes  located  at  the  "edge”  of  the  mesh. 
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2.  METHODS 


a.  Sequential 

The  Sequential  program  was  developed  to  be  the  most  efficient  obtainable, 
in  order  to  ensure  the  record  of  speedup  values  would  not  be  misleading. 

(1)  Sequential  Algorithm 

READ  DATA  FILE 
REPEAT 

CALL  PROPAGATION  SUBROUTINE 
UNTIL  all  input  records  have  been  converted  to  position  and 
velocity  vectors 
COLLECT  timing  statistics 

The  sequential  program  can  be  found  in  Appendix  A. 

b.  Parallel 

In  the  following  discussion  the  term  "node"  will  denote  one  Unix>based 
workstation  in  a  given  networlqspecifically,  one  SUN  microsystem 
SPARC  station  IPX. 

In  order  to  maximize  the  load  balancing,  a  dynamic  load  balancing  method  by  a 
pool  of  tasks  was  utilized  One  node  was  designated  the  "Master”  while  the  other  nodes 
became  the  "Slaves".  One  of  the  slave  nodes  was  designated  as  a  collecting  node.  A 
separate  collecting  node  is  an  advantage  over  having  the  master  collect,  since  collection 
will  begin  before  distribution  is  complete.  This  is  also  similar  to  the  configuration  used 
by  Phipps  (1992)  and  Ostxom  (1993)  in  their  work  on  parallel  orbit  prediction  on  the 
INTEL  Hypercube. 
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When  four  nodes  were  utilized  one  node  acted  as  the  master  and  dealt  tasks  to  two 


working  nodes  to  complete.  The  remaining  node  acted  as  the  collector  by  collecting  the 
results  from  the  working  nodes  and  returning  the  results  to  the  master.  The  research 
conducted  fay  Ford  and  Carvalho  (1993)  concluded  that  a  separate  collecting  node  is  a 
definite  advantage  over  having  the  master  collect,  since  collection  can  begin  even  before 
die  distribution  is  complete. 

In  a  awwflar  fashion,  when  eight  nodes  were  utilized  there  was  a  total  of  6  working 
nodes  and  what  sixteen  nodes  were  utilized  there  was  a  total  of  fourteen  working  nodes. 

3.  Parallel  Algorithms 

a.  Answer  Back  Method  (ABM) 

The  first  approach  was  to  minimize  the  time  a  worker  spent  idle  waiting  for  more 
data.  The  requirement  was  that  the  slave  notify  the  master  when  h  had  completed  if s  tasks 
and  was  ready  for  more  data.  This  would  result  in  the  fastest  workers  processing  the  most 
data.  The  algorithm  for  the  Master  Program  is  as  follows: 


READ  entire  satellite  catalog  input  file 
ENROLL  in  P  VM  and  spawn  n  +  1  slaves 
DESIGNATE  1  collector  and  n  workers 
REPEAT 

PACK  m  sets  of  satellite  input  records 
SEND  data  to  worker 
UNTIL  each  worker  has  m  sets  each 
REPEAT 

PACK  m  sets  of  satellite  input  records 
WAIT  until  worker  sends  ready  signal 
SEND  data  to  worker 

UNTIL  all  complete  sets  of  m  have  been  sent 
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REPEAT 

PACK  any  leftover  satellite  input  records 
WAIT  until  worker  sends  ready  signal 
SEND  data  to  worker 
UNTIL  8000  input  records  have  been  sent 
SEND  stop  signal  to  workers 
WAIT  for  program  complete  signal  from  collector 
GATHER  and  compute  timing  statistics  from  slaves. 

The  algorithm  for  the  Answer  Back  slave  program  is  as  follows: 

INITIALIZATION 
IF  I  am  the  collecting  node 
REPEAT 

WAIT  for  one  set  of  results 
STORE  results 

UNTIL  all  results  have  been  collected  from  the  workers 
SEND  program  complete  signal  to  master 
ELSE 

rm  a  working  node 
REPEAT 

WAIT  for  data  packet  from  master 
REPEAT 
UNPACK  data 
CALL  propagation  subroutine 
PACK  results 

SEND  results  to  the  collector 
UNTIL  no  more  input  records  in  the  packet 
SEND  ready  for  more  data  signal  to  the  master 
UNTIL  master  sends  stop  signal 
END  IF. 


The  Answer  Back  program  can  be  found  in  Appendix  A. 
b.  Successive  Deal  Methods 

The  second  and  third  algorithms  were  developed  to  decrease  the  communication 
time  between  the  master  and  slaves.  The  input  records  were  dealt  to  the  workers  in  sets 
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m  at  a  time.  After  giving  each  worker  an  initial  set,  the  master  continued  to  deal  input 
records  until  all  8000  records  had  beat  sent 

The  successive  deal  methods  are  basically  the  same,  the  difference  lies  in  the  wav 
tiie  input  data  is  dealt  to  each  worker. 

In  tiie  second  algorithm  (Successive  Deal  Model  IX  to  study  the  result  of  sending 
larger  data  packets,  each  worker  is  dealt  an  input  data  set  consisting  of  m  records  with  22 
dements  each.  Next,  l/(2*p)  of  tiie  remaining  records  are  dealt  to  each  worker.  Finally, 

1/p  of  the  remaining  records  is  dealt  to  each  worker.  Note  that  if  any  records  are  leftover  as 

a  result  of  tiie  integer  division,  the  leftovers  are  sent  last  For  example,  if 

n  =  number  of  data  records 
m  -  number  of  records  sent  simultaneously 
p  =  number  of  working  processors  or  nodes 
s  -  sets  of  m  records  to  be  distributed. 

and  we  let  n  =  8000 
m  =  15 

p  =  2 

Then,  tiie  number  of  sets  to  be  distributed  is  s  ~  -rr~“T7^7  =  533  sets  of  15 

15  records/sct 

with  5  input  records  leftover.  Now,  a  set  is  sent  to  each  worker  leaving  a  total  of  531  sets 
left  to  be  distributed.  Next  l/(2*p)  records  are  dealt  to  each  worker,  that  is, 

•  (531  sets)  =  132  sets  are  given  to  each  worker. 

Thus,  the  number  of  sets  left  to  be  distributed  is 

s  =  531  -  (2*132)  =  267  sets. 
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Next,  1/p  records  are  dealt  to  each  writer,  that  is ,  (l/2)*267  sets  -  133  sets  are 
distributed  leaving  1  set  leftover.  Finally,  die  leftovers  are  sent  to  a  worker  and  all  the 
input  records  have  been  distributed. 

hi  die  third  algorithm,  the  Successive  Deal  Model  n,  die  master  deals  out  one  set 
consisting  of  m  input  records  to  each  worker.  Thai,  the  master  continues  to  deal  out  data 
sets  until  all  the  records  have  been  distributed.  For  example,  using  the  variables  defined 

above,  let 

n  =  8000 
m=  15 

p  =  2 

then, 


a  = 


8000 records 
15  records / set 


-  533 sets  +  5  records  leftover. 


First,  one  set  is  given  to  each  worker,  resulting  in  531  sets  left.  Then,  die  sets  would  be 


distributed,  one  at  a  time,  first  to  one  worker  and  then  to  die  other  worker.  Last,  the 
leftover  records  are  sent. 

(1)  Successive  Deal  Method  I  (SDI)  Algorithm 


Master  Algorithm 

READ  entire  satellite  catalog  input  file 
ENROLL  in  PVM  and  spawn  n  +  1  slaves 
DESIGNATE  1  collector  and  n  workers 
REPEAT 

PACK  one  set  of  m  input  records 
SEND  data  to  worker 
UNTIL  each  worker  has  one  set 
REPEAT 

PACK  l/(2*p)  records 
SEND  data  to  worker 
UNTIL  each  worker  has  a  packet 
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REPEAT 

PACK  remaining  sets 
SEND  data  to  worker 
UNTIL  each  worker  has  a  equal  packet 
REPEAT 
PACK  leftovers 
SEND  leftovers 

UNTIL  all  input  records  have  been  sent 

SEND  stop  signal  to  workers 

WAIT  for  program  complete  signal  from  collector 

GATHER  and  compute  timing  statistics  from  slaves. 

Slave  Algorithm: 

INITIALIZATION 
IF  I  am  the  collecting  node 
REPEAT 

WAIT  for  one  set  of  results 
STORE  results 

UNTIL  all  results  have  been  collected  from  the  workers 
SEND  program  complete  signal  to  master 
ELSE 

I'm  a  working  node 
REPEAT 

WAIT  for  data  packet  from  master 
REPEAT 
UNPACK  data 
CALI,  propagation  subroutine 
PACK  results 

SEND  results  to  tf  ollector 
UNTIL  no  more  inp^  records  in  the  packet 
UNTIL  master  sends  stop  signal 
END  IF 
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(2)  Successive  Deal  Model  D  (SDH)  Algorithm 
Master  Algorithm: 

READ  entire  satellite  catalog  input  file 
ENROLL  in  PVM  and  spawn  n  +  1  slaves 
DESIGNATE  1  collector  and  n  workers 
REPEAT 

PACK  one  set  of  m  input  records 
SEND  cme  set  to  each  worker 
UNTIL  each  worker  has  one  set 
REPEAT 

PACK  m  sets  of  input  records 
SEND  data  to  worker 
UNTIL  all  m  sets  have  been  distributed 
REPEAT 

PACK  remaining  input  records 
SEND  data  to  worker 

UNTIL  all  input  records  have  been  distributed 
SEND  stop  signal  to  workers 
WAIT  for  program  complete  signal  from  collector 
GATHER  and  compute  timing  statistics  from  slaves. 

Slave  Algorithm: 

INITIALIZATION 
IF  I  am  the  collecting  node 
REPEAT 

WAIT  for  one  set  of  results 
STORE  results 

UNTIL  all  results  have  been  collected  from  the  workers 
SEND  program  complete  signal  to  master 
ELSE 

I'm  a  working  node 
REPEAT 

WAIT  for  data  packet  from  master 
REPEAT 
UNPACK  data 

CALL  propagation  subroutine 
PACK  results 

SEND  results  to  the  collector 
UNTIL  no  more  input  records  in  the  packet 
UNTIL  master  sends  stop  signal 
END  IF. 


For  the  source  code  of  the  algorithms  discussed  above  see  Appendix  A.  The 
programs  developed  were  written  In  C.  The  SGP4  code  ta  written  hi  FORTRAN.  The 
C  framework  using  a  PVM  architecture  calling  a  FORTRAN  satellite  propagation 
subroutine  was  successful. 

D.  PROGRAM  OVERVIEW 

1.  Sequential 

The  sequential  version  was  executed  10  times  and  the  total  run  times  were 
averaged.  This  was  done  four  times  and  the  four  average  values  were  averaged  resulting 
in  a  sequential  time  T\t  which  is  used  in  die  calculation  of  speedup. 

The  total  time  for  die  program  to  execute  did  not  indude  the  initial  time  to  read  the 
entire  input  catalog  because  this  was  done  one  time  only  at  the  beginning  of  each  program. 
From  this  point  on  the  total  time  to  execute  the  program  ,  excluding  readtime  will  be 
called  endtoend  time.  The  sequential  average  endtoend  time  was  used 
in  die  calculation  of  speedup  which  win  be  discussed  in  the  Parallel  section  below. 

2.  Parallel 

hi  each  program  discussed  under  die  Parallel  Algorithm  section  above,  time  docks 
woe  inserted  at  various  locations  in  order  to  measure  die  time  to  read  the  entire  input 
catalog,  die  endtoend  time,  the  worker's  communication  time,  and  the  worker's  calculation 
time. 
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The  number  of  satellite  input  records  (consisting  of  the  22  input  values)  sent 
simultaneously  to  each  worker  was  chosen  to  be  other  5, 10,  IS,  20, 2S,  30,  3S,  40,  4S, 
SO,  or  SS.  This  was  based  upon  previous  work  done  by  Ford  and  Carvalho  (1993). 

The  number  of  nodes  utilized  was  4,  8,  or  16.  To  configure  the  personal  parallel 
virtual  machine,  a  list  of  names  of  the  Unix-based  machines  used  was  listed  in  a  file  called 
hostfie.  When  PVM  was  started  by  die  command  pvmd3  hostflle  & ,  he  hostfile  was 
automatically  read  and  die  machines  were  ready  to  act  as  nodes  in  a  parallel  application. 

The  machine  from  which  the  application  was  started  acted  as  die  master  and  die 
slave  nodes  woe  spawned  by  first  specifying  die  number  of  nodes  desired  (numnodcs) 
and  then  executing  the  statement 

num  =  pvm_spawn(SLAVENAME,  (char**)  0,  0,  num  nodcs,  tids). 

The  selection  of  4,  8,  or  16  nodes  was  based  upon  previous  work  done  by  Ostrorn  (1993) 
in  die  parallelization  of  die  SOP4  code  using  the  Naval  Postgraduate  School  INTEL 
iPSC/2  Hypercube.  This  is  a  Multiple  Instruction  stream,  Multiple  Data  stream  (MIMD) 
multicomputer.  It  consists  of  a  system  resource  manager  called  die  host,  and  eight 
individual  processors,  referred  to  as  nodes. 

Data  for  cadi  set  of  choices  discussed  above  was  collected  for  ten  iterations  of  the 
entire  program  and  these  results  were  averaged. 
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a.  Analysis 

For  endtoend  time,  percent  worker  commurocation,  speedup,  and 
efficiency,  two  comparisons  woe  analyzed  to  measure  the  performance 
of  each  algorithm: 

(1)  For  a  given  algorithm,  the  performance  of  four,  eight,  and  sixteen 
nodes  utilized  was  compared  and 

(2)  For  a  given  number  of  nodes,  the  three  algorithm's  performance 
was  compared. 

For  both  cases  above  the  number  of  satellite  input  records  sent 
simultaneously  was  either  5, 10,  IS,  20,  25,  30,  3S,  40,  4S,  SO,  or  SS. 

It  is  important  to  note  that  for  all  cases,  the  same  input  record  was  utilized; 
thus,  for  all  three  models  die  number  of  calls  made  to  SOP4  and  SDP4  was  the  same. 

E.  RESULTS 

1.  Read  Time 

The  time  to  read  die  data  file  (consisting  of  8000  records )  varied  from 
approximately  39  seconds  to  1 100  seconds.  Thus,  the  readtime  was  extremely  dependent 
of  die  load  on  die  system  at  the  time  the  data  file  was  read.  This  was  in  contrast  to  the 
rasutts  found  by  Ford  and  Carvahlo  (1993);  the  number  of  input  records  used  in  their 
research  was  630  and  die  read  time  was  approximately  5  seconds  for  each  execution. 
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2.  Endtoend  Time 


The  endtocnd  lime  is  the  mo st  important  time  considered  because  it  is  a  reflection 
of  the  total  performance  of  each  algorithm  designed. 

a.  Method  Comparison 

For  4  and  8  nodes,  the  optimal  performance  was  achieved  by  the  Answer  Back 
Method  (ABM).  For  16  nodes ,  with  the  exception  of  sending  IS,  SO,  or  SS  records  at  a 
time  the  ABM  was  superior.  That  is,  when  sending  5, 10,  20,  2S,  30,  35,  40,  and  45 
records  simultaneously,  the  ABM  produced  die  fastest  times. 

From  tins  point  on  in  this  analysis,  when  a  given  algorithm  is  superior  the 
majority  of  the  cases  (as  shown  above)  die  term  "in  general"  wiD  be  used.  For  die  case 
above,  one  would  say  "When  16  nodes  woe  utilized,  in  general,  die  Answer  Bade  Method 
(ABM)  was  die  best"  The  following  graphs  illustrate  these  results: 


(1)  Using  four  nodes  die  Answer  Back  Method  was  die  fastest: 


Figure  4.1  Four  Node  Comparison  of  Models 
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Endtoind  Tim*  (**c) 


(2)  Using  eight  nodes  die  Answer  Back  Method  was  the  fastest. 


Figure  4.2  Eight  Node  Comparison  of  Models 

(3)  Using  sixteen  nodes,  in  general,  die  Answer  Back  Method  was  fastest 


Figure  4.3  Sixteen  Node  Comparison  of  Models 
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b.  Node  Comparison 


For  die  analysis  comparing  the  performance  of  various  choices  of  nodes  for  a  given 
algorithm  die  following  conclusions  can  be  made: 

(1)  For  the  Answer  Back  Method,  a  choice  of  right  nodes  was  die  best; 
closely  followed  by  sixteen  nodes. 


Figure  4.4  Answer  Back  Model  Node  Comparison 


(2)  For  the  Successive  Deal  I,  a  choice  of  sixteen  nodes  is  superior. 


Figure  4.5  Successive  Deal  Method  1  -  Node  Comparison 


(3)  For  the  Successive  Deal  Method  11,  a  dunce  of  sixteen  nodes  is 
superior.  It  is  not  surprising  that  sixteen  nodes  is  the  best  choice  for  both  Successive  Deal 
Methods  because  both  algorithms  are  very  similar,  in  general,  one  can  note  that  die  number 
of  nodes  utilized  should  decrease  the  endtoend  time.  The  Successive  Deal  Method  n 
results  can  be  seen  on  die  next  page,  Figure  4.6. 


f - \ 


Number  of  Satellite  Input  Records  Sent  Simultaneously 
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Figure  4.6  Successive  Deal  Method  II  -  Node  Comparison 

It  is  interesting  to  note  that  for  the  Answer  Back  Model  utilizing  eight  nodes  was 
superior  over  sixteen  nodes  for  all  cases.  This  could  be  attributed  to  the  fact  that  with 
sixteen  nodes  die  communication  time  (which  was  naturally  greater  in  the  Answer  Back 
Model)  between  die  master  and  slaves  decreased  die  advantages  of  parallelization;  whereas, 
with  eight  nodes  the  advantages  of  parallelization  outweighed  die  disadvantage  of  the 
communication  time  between  the  master  and  die  slaves. 

3.  Percent  Worker  Communication 

As  one  can  aee  from  the  analysis  above,  communication  time  is  an  important  factor 
in  die  performance  of  a  given  algorithm. 
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In  "PVM  Concurrent  Computing  System:  Evolution,  Experiences,  and  Trends” 
Sunderman,  Oast,  and  Mancheck  (p.  7, 1993)  state  dud  PVM  normally  operates  m  a 
general  purpose  networked  environment  and  as  a  result,  raw  performance  or  speedup  of  a 
given  application  is  hard  to  measure.  They  go  on  to  state  that  "in  such  a  scenario,  most  of 
the  focus  is  on  communications  overhead." 

With  communications  overhead  in  mind,  die  time  each  worker  spent 
communicating  versus  the  time  spent  calculating  was  evaluated.  Using  average  values,  the 
percent  of  time  the  worker  communicates  was  calculated  as  follows: 


%  Worker  Communication  Time 


Average  Communication  Time 
Average  Calculation  Time 


The  goal  was  to  increase  die  amount  of  time  a  worker  spent  calculating  and 
decrease  die  time  a  worker  spent  communicating,  resulting  in  a  small  communication 
overhead. 

a.  Model  Comparison 

For  a  given  number  of  nodes,  the  performance  of  the  three  models  in  terms 
of  communication  overhead  was  evaluated  and  the  results  are  as  follows: 

(1)  Utilizing  four  nodes  for  each  model  produced  varied  results;  in  general, 
die  ABM  and  die  SDH  were  the  best  choices.  The  minimum  percent  worker 
communication  time  was  attained  by  the  SDII  Method  when  sending  35  satellite  input 
records  at  a  time. 
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Figure  4.7  Percent  Worker  Communication  For  Each  Model  Using  4  Nodes 

(2)  When  utilizing  eight  nodes,  both  Successive  Deal  Models  were,  in 
general,  superior  over  die  Answer  Back  Model.  The  minimum  percent  worker 
communication  was  attained  by  SDH  when  sending  55  satellite  input  records  at  a  time. 


Figure  4.8  Percent  Worker  Communication  For  Each  Model  Using  8  Nodes 


(3)  When  utilizing  sixteen  nodes,  again  the  Successive  Deal  Methods  were 
superior  over  the  Answer  Back  Method.  The  minimum  percent  worker  communication 
was  attained  by  the  SDH  when  sending  35  input  records  at  a  time. 


Figure  4.9  Percent  Worker  Communication  For  Each  Model  Using  16  Nodes 


The  Successive  Deal  II  proved  to  be  die  best  choice  in  terms  of  communication 
overhead.  The  Answer  Back  Method  required  die  additional  communication  between  the 
master  and  slaves  which  increased  the  communication  overhead.  The  Successive  Deal  I 
message  size  was  significantly  larger,  producing  slightly  inferior  results  than  the  Successive 
Deal  II  which  continually  dealt  out  small  packets  of  data, 
b.  Node  Comparison 

For  a  given  algorithm,  die  performance  of  four,  eight,  and  sixteen  nodes  was 
evaluated  in  toms  of  communication  overhead.  The  results  are  as  follows: 
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Figure  4.10  ABM  Percent  Worker  Communication  -  Node  Comparison 


(2)  For  the  SDI,  in  general,  die  utilization  of  4  nodes  was  die  best 
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Figure  4.11  SDI  Percent  Worker  Communication  -  Node  Comparison 
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(3)  For  SDD  the  use  of  four  or  eight  nodes  was  die  best  choice. 


Figure  4.12  SDII  Percent  Worker  Communication  -  Node  Comparison 

These  results  are  not  surprising  due  to  the  fact  that  for  a  given  algorithm  each 
worker's  calculation  time  is  approximately  constant  (since  they  all  utilize  the  same  input 
record)  and  the  communication  time  between  the  master  and  slaves  is  reduced  when  there 
are  fewer  slaves. 

4.  Speedup 

As  mentioned  earlier,  in  a  general  purpose  network  environment,  speedup  is  hard  to 
measure  with  a  great  deal  of  confidence. 


Recall,  speedup  (Sp )  is  calculated  as  follows: 


where  ~  Endtocnd  Time  on  a  Single  Processor 
Tp  -  Endtocnd  Time  cm  p  Processors 

Ideally,  die  speedup  equals  "p"  the  number  of  processors;  however,  due  to 
communication  costs,  sequential  bottlenecks,  and  computational  tasks  not  necessary  on  a 
single  processor  die  speedup  is  less  than  "p". 

With  die  limitations  of  speedup  results  discussed  above  in  mind,  the  following 
results  woe  found  to  be  true. 


i  Model  Comparison 

(1)  Utilizing  four  nodes  for  each  model,  die  ABM  was  superior. 


Figure  4.13  Speedup  Model  Comparison  When  Using  Four  Nodes 


Speedup 


(2)  Utilizing  Eight  Nodes  for  each  model,  the  ABM  was  superior. 


Figure  4.14  Speedup  Model  Comparison  When  Using  Eight  Nodes 


(3)  Utilizing  sixteen  nodes,  die  ABM,  in  general,  was  die  best. 


Figure  4.15  Speedup  Model  Comparison  When  Using  Sixteen  Nodes 
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b.  Node  Comparison 


(1)  For  tuc  Answer  Back  Model  using  8  or  16  nodes  was  superior 
- - 


Number  of  Satellite  Input  Records  Sent  Simultaneously 
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Figure  4.16  Answer  Back  Model  Speedup 

(2)  For  the  Successive  Deal  I  the  use  of  16  nodes  was  superior. 

r - 


Number  of  Satellite  Input  Records  Seri  Simultaneously 
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Figure  4.17  Successive  Deal  I  Speedup 


(3)  For  the  Successive  Deal  II,  utilizing  16  nodes  was  superior. 


Figure  4.18  Successive  Deal  II  Speedup 


These  speedup  results  are  directly  related  to  endtoend  performance.  If  one 
compares  figures  4. 4-4.6,  the  endtoend  times  for  each  model,  and  the  figures  4. 16-4.18  of 
speedup*  above  an  inverse  relationship  is  noted. 

5.  Efficiency 

Recall,  Efficiency  ~  E  - 

where  Sp  =  Speedup  for  p  processors 
p  =  Number  of  processors 

Thus,  the  efficiency  is  a  measure  of  the  speedup  per  processor  or  how  close  the 
actual  speedup  is  to  the  theoretical  speedup  (p).  The  efficiency  was  evaluated  in  terms  of  a 
comparison  of  models  and  a  comparison  of  die  node  performance  for  a  given  model  The 
results  are  as  follows: 
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a.  Model  Comparison 


(1)  Utilizing  4  nodes  the  Answer  Bade  Model  was  superior. 


Figure  4.19  Four  Node  Efficiency  Model  Comparison 


(2)  Utilizing  8  nodes,  die  Answer  Back  Model  was  Superior. 


Figure  4.20  Eight  Node  Efficiency  Model  Comparison 
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(3)  For  odeen  nodes,  there  was  a  large  fluctuation  for  all  models;  however, 
m  general  the  Answer  Back  Model  was  the  best  choice. 


b.  Node  Comparison 

(1)  For  die  ABM,  the  utilization  of  4  or  8  nodes  was  superior. 


V _ J 

Figure  4.21  Answer  Back  Model  Efficiency 


Figure  4.22  Successive  Deal  I  Efficiency 
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Figure  4.23  Successive  Deal  11  Efficiency 


It  is  important  to  note  that  with  the  use  of  an  open  network,  there  are  great 
fhicuatkns  in  the  amount  of  time  taken  to  perform  a  given  task.  The  execution  time 
depends  on  die  number  of  current  system  users  and  die  percentage  of  die  CPU  allocated  to 
each  user.  For  example,  if  one  user  is  running  a  large  application  on  a  given  station  and 
another  user  is  using  this  same  station  for  PVM  applications,  die  execution  time  win 
be  increased. 

In  conclusion,  considering  all  factors  discussed  above,  the  Answer  Back  Model 
was  die  best  algorithm.  When  using  four,  eight,  or  sixteen  nodes,  die  Answer  Back  Method 
produced  die  best  Endtoend  times,  Speedups,  and  Efficiencies  for  all  size  data  packets 
distributed  at  one  time. 
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The  fastest  time  resulted  with  the  ABM  using  eight  nodes  and  sending  five  satellite 
input  records  at  a  time.  The  utilization  of  8  nodes  gives  die  maximum  parallelization 
advantage  and  the  minimum  communication  overhead.  The  Answer  Back  Method  required 
the  slaves  to  notify  the  master  when  ready  for  more  data ,  this  reduced  the  time  spent 
waiting  for  data;  additionally,  the  fastest  workers  were  the  ones  that  processed  more  data. 

In  toms  of  communication  overhead,  the  Successive  Deal  Q  Method  was  superior 
to  the  Successive  Deal  I  and  the  ABM  The  SDH  did  not  have  file  added  communication 
between  the  Master  and  Slaves  that  was  inherent  in  the  Answer  Back  Method. 

No  conclusions  can  be  made  regarding  the  best  size  data  packet  to  send  because 
although  sending  five  input  records  at  a  time  resulted  in  file  best  endtoend  time  of  73.42 
seconds  die  endtoend  time  when  sending  fifty-five  records  resulted  in  an  endtoend  time  of 
74.85  seconds.  Further  research  would  need  to  be  conducted  to  provide  conclusive  results 
on  die  optimal  size  data  packet  to  be  distributed. 
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V.  CONCLUSIONS 


The  goal  of  this  thesis  is  to  illustrate  how  a  network  of  computer  workstations  is 
used  as  a  parallel  computer  to  solve  a  military  requirement  of  tracking  8000  satellites 
daily. 

The  Air  Force  Space  Command  (AFSPACECOM)  satellite  computer  code  ran 
approximately  2.6  times  faster  by  the  parallelization  of  the  code  implemented  on  die 
Parallel  Virtual  Machine  (PVM)  using  8  workstations.  PVM  is  a  small  software  package 
(~  Mbyte  of  C  source  code)  that  allows  a  network  of  computers  to  appear  as  a 
distributed-memory  parallel  computer. 

Marry  scientists  do  not  use  their  workstations  all  the  time  and  when  applications 
are  to  be  run  may  need  more  power  than  a  single  workstation  can  provide.  The  cost  of 
allocating  large  computing  resources  to  each  user  is  rising  daily;  thus,  die  use  of  PVM  or 
a  similar  product  will  be  standard  in  the  future. 

For  military  applications,  this  work  illustrates  how  to  use  PVM  to  track  satellites 
using  ordinary  workstations.  A  Naval  PVM  application  would  be  to  use  a  system  of 
workstations  located  at  various  enclaves  in  the  ship  to  track  and  destroy  incoming  threats. 
If  die  ship  took  a  direct  hit  in  one  of  its  enclaves,  the  crew  would  be  able  to  choose 
unaffected  workstations  to  continue  computing  power;  thus,  reducing  the  vulnerability  of 
die  ship. 

The  AFSPACECOM1  s  Simplified  General  Perturbation  Model  Four  (SGP4)  has 
been  the  operational  theory  since  1976.  The  SGP4  model  uses  six  classical  orbital 
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dements,  a  drag  factor,  and  an  epoch  time  to  predict  a  satellite's  position  and  velocity  at 
a  future  time. 

The  SGP4  and  ifs  extension,  SDP4,  are  both  analytical  models.  Although  the 
solutions  are  not  as  accurate  as  numerical  techniques,  they  are  computationally  less 
expensive.  A  detailed  discussion  of  the  SGP4's  mathematical  theory  can  be  found  in 
Chapter  m. 

Currently,  DA.  Danielson  and  B.  Neta  at  the  Naval  Postgraduate  School  are 
documenting  and  testing  a  semi-analytical  satellite  motion  model  developed  by  Diaper 
Lab.  This  will  increase  the  accuracy  while  decreasing  the  computational  cost.  See 
documentation  by  Danielson,  Early,  and  Neta  (1993)  and  numerical  experiments 
comparing  the  semi-analytics  to  numerical  and  analytical  models  by  Dyer  (1993). 

Three  algorithms  were  developed  to  parallelize  the  AFSPACECOM  code  and  the 
performance  of  each  algorithm  was  tested.  All  three  algorithms  use  a  Master  and  Slave 
approach  with  a  separate  collector  to  collect  the  results  and  send  them  back  to  the 
Master.  The  Master  distributes  the  data  to  die  Slaves.  The  Slaves  perform  all  the 
calculations  necessary  to  produce  the  position  and  velocity  vectors  for  each  satellite.  The 
algorithms  differed  in  the  manner  in  which  die  data  is  distributed.  Each  algorithm  is 
tested  using  four,  eight,  and  sixteen  workstations. 

The  algorithm  that  required  the  Slaves  to  notify  the  Master  when  ready  for  more 
data  resulted  in  the  best  times,  this  method  is  called  the  Answer  Back  Method  or  ABM. 
In  the  ABM,  there  was  less  time  spent  by  the  Slaves  waiting  for  more  data  to  process 
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which  resulted  in  the  fastest  workers  processing  the  most  data.  Whoa  using  four,  eight,  or 
sixteen  workstations,  the  ABM  produced  die  best  total  times,  speedups,  and  efficiencies. 

One  area  of  further  research  would  include  the  use  of  more  than  sixteen 
workstations  and  an  algorithm  designed  to  reduce  die  bottleneck  created  by  the  collecting 
node.  Perhaps,  the  use  of  two  or  more  collectors  would  be  advantageous.  Additionally, 
further  research  should  be  conducted  to  provide  conclusive  results  on  the  optimal  size 
data  packet  to  be  distributed. 

Some  of  the  curves  exhibit  large  fluctuations,  this  is  probably  due  to  changes  in 
die  number  of  users  on  the  system  at  die  time  the  data  was  collected.  Further  research 
should  be  conducted  to  test  if  the  results  are  reproducible  to  some  extent 

The  effect  of  writing  the  results  to  an  output  file  was  not  considered  in  this 
research.  Any  research  conducted  in  die  future  should  examine  the  results  produced  when 
including  the  time  required  to  write  to  an  output  file. 

In  conclusion,  die  result  of  this  diesis  confirms  that  PVM  can  be  used  to  track 
orbiting-earth  satellites.  The  use  of  workstations  for  parallel  processing  uses  untapped 
power  and  decreases  the  amount  of  computational  time  required.  As  the  number  of 
objects  to  be  tracked  and  the  computational  power  required  increases  this  work  will 
become  increasingly  more  important. 
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APPENDIX  A :  SOURCE  CODE 


/****************************************«******************************* 

*  sat _jnaster_ab . c  LAST  UPDATE:  Oct  5  1993  * 

*  LT  S.K.  Brewer  * 

*  This  is  the  master  program  for  the  Answer  Back  Method.  It  uses  PVM  * 

*  to  simulate  a  2D  torus  of  processors ;n+l  slaves  are  spawned,  of  * 

*  which  n  are  working  nodes  and  1  is  the  collecting  node.  * 

*  Satellite  data  is  issued  to  the  workers  in  “Answer  Back"  fashion,  * 

*  sending  new  data  to  a  working  node  only  when  the  node  is  ready.  * 

*  Timing  data,  collecting  for  statistical  purposes  only,  are  placed  in  * 

*  the  file  "timing. ans"  which  will  be  placed  in  the  directory  from  * 

*  which  this  master  program  is  invoked.  * 

*•*********************************************************************/ 

#include  <stdio.h>  /*  INCLUDE  STANDARD  I/O  FUNCTIONS  */ 

#include  "pvm3.h"  /*  INCLUDE  PVM  FUNCTIONS  */ 

#include  <sys/time.h> 

# include  <time.h> 

#include  <math.h> 

#include  <sys/types .h> 

# define  SLAVENAME  "at. run" 

int  main (argc,  argv)  /*  GET  FILE  NAME  FROM  COMMAND  LINE  */ 

int  argc; 

char  *argv[]; 

( 

int  num_nodes=3 ;  /*  NUMBER  OF  SLAVE  NODES  */ 

int  num_satdata=15;  /*  input  data  records  distributed  */ 

int  num_elements=22 ;  /*  NUMBER  OF  elements  in  each  data  record  */ 

double  sat [10000] [22] ;  /*  ARRAY  OF  satellite  input  records  */ 

int  its, nod, size, delta=5 ; 

int  num,  mytid,  i=0,  j,  k,  tids[32],  msgtag,  reading=l; 

int  numsat=0,  collector,  leftover,  worker,  sets,  work_nodes, done=0 ; 

struct  timeval  ts[4];  /*  Number  of  time  stamps  */ 

int  who; 

float  endtoend, tcomm, average=0 . 0 , avcoll=0 . 0 , avcomm=0 . 0 , avcalc=0 . 0 ; 

float  cmtime,  commtime,  cctime,  calctime,  readtime,  c_comm, avpcm=0 . 0 ; 

float  avpcl=0 . 0 , aa=0 . 0 ; 

FILE  "infile,  "timing; 

int  msgtag99=99; 

gettimeofday  (Sets  [0 ],  (struct  timeval  * )  0 ) ;  / *  BEGIN  READING  DATA  FILE  */ 

/*  OPEN  DATA  FILE  */ 

if  ((infile  =  fopen(argv(l] ,  "r")j  ==  NULL) 

{  printf ( "inf ile  =  %s  did  not  open\n",  argv[l]); 
exit (1)  ; 

) 


73 


/*  READ  ENTIRE  DATA  FILE  AT  ONCE  */ 
while (reading  !=  EOF) 

{  if  ((reading  =  f  scanf  (inf  ile,  *%lf“,  &sat  [numsat  ]  [0] ) )  !=  EOF) 
for  (j=l;  j<num_elements;  ++j) 

f scanf ( inf ile,  “%lf“,  &sat (numsat] [j ]) ; 
numsat=numsat+l;  /*  COUNT  NUMBER  OF  SATELLITES  IN  DATA  FILE  */ 

) 

fclose(inf ile) ; 
numsat =numsat-l ; 

gettimeofday(&ts[l] , (struct  timeval  *)0);  /*  END  READING  DATA  FILE  */ 

/•  SET  UP  FILE  FOR  TIMING  STATISTICS  */ 
timing  =  fopen( "timing. ans* ,  “a"); 

readtime  =  (ts [ 1] . tv_sec-ts [0] . tv_sec )*1000000+ts[l] . tv_usec-ts [0] . tv_usec ; 
fprintf (timing,  “Time  to  read  data  file  =  %ld  microseconds\n“ ,  readtime); 
for(size=0;  size<55;  size  +=delta) 

{ 

num_satdata  =  size  +  5; 
for(nod=0;  nod<3;  ++nod) 

{ 

if (nod  ==  0) 
num_nodes  =  3 ; 
else 

if (nod  ==  1 ) 

num_nodes  =  7 ; 
else  num_nodes  =  15; 

fprintf (timing,  “sats, nodes,  endtoend  collector_comm 
worker_comm  worker_calc\n" ) ; 

fprintf (timing, “%d  %d\n" , num_satdata, num_nodes ) ; 


for(its=0;  its<l;  ++its) 

{ 

gettimeofday (Sets [2] ,  (struct  timeval  *)0);/*  BEGIN  END  TO  END  TIME*/ 
/**********  ENROLL  IN  PVM  ***********/ 

mytid  =  pvm_mytid(); 

/*  START  UP  SLAVE  TASKS  */ 

num=pvm_spawn ( SLAVENAME ,  (char**)0,  0,  num_nodes,  tids) ; 

collector=tids [0] ; 

/*  SEND  SLAVES  THIER  INDICES  INTO  THE  TID  ARRAY  */ 
msgtag=l ; 

for  (i=0;  i<num_nodes;  ++i) 

(  pvm_initsend(PvmDataRaw) ; 
pvm_pkint ( &i , 1, 1) ; 
if  ( i==0 ) 

pvm_pkint ( tnumsat ,  1,  1);  /*  TELL  COLLECTOR  NUMBER  OF  SATS  */ 

else 

pvm_pkint (£<col lector,  1,  1);/*  TELL  WORKERS  COLLECTOR'S  ADDRESS*/ 
pvm_send(  tids(i],  msgtag) ; 

) 


74 


/*  SEND  SETS  OF  SATELLITE  DATA  TO  WORKERS,  WAITING  FOR  ANSWER  BACK  */ 
msgtag=2 ; 
k=0 ; 

wor k_nodes  =num_node  s - 1 ; 
sets=numsat/num_satdata; 
leftover=numsat- set s*num_sat data; 
i=0  ; 

for ( j=l; j<num_nodes;  ++ j )  /*  DEAL  ONE  SET  OF  SATELLITES  TC  EACH  WORKER  */ 

(  pvm_initsend( PvmDataRaw) ; 

pvm_pkint (&num_satdata, 1,1) ; 
for  (k=0;  k<num_satdata ;  ++k) 

{  pvm_pkdouble ( sat [ i ] ,  num_elements , 1 ) ; 
i=i+l ; 

) 

pvro_send ( t ids [ j ] ,  msgtag ) ; 
sets=sets-l ; 

) 

while (sets>0)  /*  DEAL  REMAINING  SETS  TO  WORKERS  AS  THE  NODES  BECOME  FREE  */ 

{  pvm_initsend(PvmDataRaw) ; 

pvm_pkint (&num_satdata, 1,1); 
for  (k=0;  k<num_satdata;  ++k) 

{  pvru_pkdouble  (sat  [ i ]  ,  num_elements,  1 )  ; 
i=i+l; 

) 

sets=sets - 1 ; 
pvm_recv(-l,  msgtag99); 
pvm_upkint (&who, 1,1); 
pvm_send(tids [who] , msgtag) ; 

} 

if  ( leftover>0 )  /*  SEND  LEFTOVERS  TO  WHOEVER  IS  READY  NEXT  */ 

{  pvm_initsend ( PvmDataRaw) ; 
pvm_pkint (^leftover,  1,1); 
for  (k=0;  k<leftover;  ++k) 

{  pvm_pkdouble (sat [ i ] ,  num_elements , 1 ) ; 
i  =  i+l ; 

) 

pvm_recv(-l, msgtag? 9) ; 
pvm_upkint (&who, 1,1); 
pvm_send ( t ids [who ] , msgtag ) ; 

} 

pvm_initsend( PvmDataRaw) ; 

pvm_pkint  (Stdone,  1,  1);  /*TELL  WORKERS  NO  MORE  DATA  IS  COMING*/ 
pvm_mcast (tids,  num_nodes,  msgtag); 

msgtag=5; /*  RECEIVE  PROGRAM  COMPLETE  SIGNAL  FROM  COLLECTOR  */ 
pvm_recv ( - 1 , msgtag ) ; 

/*  COMPLETE  END  TO  END  TIME  */ 

gettimeofday (its [ 3 ], (struct  timeval  *)0); 
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/*  GATHER  TIMING  STATISTICS  FROM  SLAVES  */ 
msgtag=4 ; 

for  (i=0;  i<num_nodes ;  ++i) 

{  pvm_recv(-l,msgtag) ; 
pvn\_upkint  ( &who  ,1,1); 

if  (who  ==  0)  /*  TIMES  FROM  COLLECTOR  */ 

{ 

pvm_upklong (&c_comm, 1,1); 

) 

else 

/*  TIMES  FROM  WORKERS  */ 

{ 

pvm_upklong ( &cct ime ,1,1); 
calctime=calctime+cctime ; 
pvm_upklong (&cmtime,  1,1)  ; 
commt ime=commt ime+cmt ime ; 

) 

) 

pvm_exit ( ) ; 

/*  COMPUTE  OVERALL  TIMING  STATISTICS  */ 

>ndtoend= (float )  (ts [ 3 ] . tv_sec-ts [2 ) . tv_sec  >*10000  00  + 

(float ) ts ( 3 ] . tv_usec- (float ) ts [2 ) . tv_usec ; 

/‘convert  to  seconds*/ 
c_comm=c_comm/ 1 . 0E6 ; 
endtoend=endtoend/ 1 .0E6; 
commt ime=commt ime /I . 0E6; 
calctime=calctime/l . 0E6 ; 

/*  TOTAL  TIME*/ 
average  =  average  +  endtoend; 

avcoll  =  avcoll  +  c_comm; 

avcomm  =  avcomm  +  commt ime; 

avcalc  =  avcalc  +  calctime; 

fprintf (timing,  " 

%6.2f  %6.2f\n", endtoend, c_comm, commt ime, calct ime) ; 

) 


/‘collector  communication  time*/ 
/‘worker  communication  time*/ 
/•worker  calculation  time*/ 

%6.2f  %6.2f 


average  =  average/ its; 
avcoll  =  avcoll/its; 
avcomm  =  avcomm/ its; 
avcalc  =  avcalc/ its; 
avpcm=avcomm/ (num_nodes-l) ; 
avpcl=avcalc/ (num_nodes-l ) ; 
aa= (avpcm/avpcl ) *100 ; 

/*  print  results  to  output  file  -  not  shown  in  this  code  */ 

) 


fclose  (timing)  ,- 

printf (’ENTIRE  SEQUENCE  COMPLETE*); 


) 


76 


/**************#*********************************»************* 

*  * 

*  sat_slave_ab.c  LAST  UPDATE:  05  OCT  1993  * 

*  Susan  Brewer  * 

*  This  is  the  slave  program  for  the  Answer  Back  Method.  * 

*  It  uses  PVM  to  simulate  a  2D  torus  of  processors.  * 

*  The  slave  with  index  0  will  be  the  collecting  node.  * 

*  This  program  “answers  back*  for  more  data.  * 

*  The  Fortran  sub-routine  *sgp4m*is  called  to  perform  the  * 

*  calculations  for  orbit  prediction  * 

*******■************★★*★★**★****★★★★*★★*****★★★★**★***★★★**★*★*/ 

♦include  *pvm3.h*  /*  INCLUDE  PVM  FUNCTIONS  */ 

♦include  <stdio.h> 

♦include  <sys/time.h> 

♦include  <time.h> 

♦include  <math.h> 

♦include  <sys/types .h> 

main  ( ) 

{ 


double 

results [7*100+1] ; 

/* 

ARRAY  OF  RESULTS  */ 

int 

num_elements=22 ; 

/* 

FIELDS  IN  INPUT  SATELLITE  RECORD 

*/ 

double 

sat_data[22]  ; 

/* 

ONE  SATELLITE  INPUT  RECORD 

*/ 

int 

max=8000,  sats=l; 

int 

sat_.no ; 

int 

i.j,  k,  t,  r_length; 

/*  COUNTERS 

*/ 

int 

tids [32] ; 

/*  ARRAY  OF  PROCESSOR  IDS 

*/ 

int 

mytid,  numnode; 

/*  MY  PROCESSOR  ID 

*/ 

int 

me,  collector; 

/* 

MY  INDEX  INTO  THE  TIDS  ARRAY 

*/ 

int 

master , msgtag ,  msgtag2=2,  msgtag3=3,  msgtag99=99; 

struct 

timeval  ts[4]; 

int 

res_sets=0 ; 

float 

s=0.0,  u=0.0,  totaltime,  calc,  comm; 

extern 

sgp4m_  ();  /*  EXTERNAL  SUB-ROUTINE  FOR  ORBIT  PREDICTION 

*/ 

mytid  =  pvm_mytid();  /*  ENROLL  IN  PVM  */ 

master=pvm_parent ( ) ; 

/*  RECEIVE  MY  INDEX  AND  COLLECTOR'S  TID  FROM  MASTER  */ 

gettimeofday (&ts [ 0 ], (struct  timeval  *)0); 
msgtag  =  1 ; 

pvm_recv(  -1,  msgtag  ) ; 

pvm_upkint (&me,  1.1);  /*  GET  MY  INDEX  IN  THE  ARRAY  OF  TIDs  */ 

pvm_upkint  (Sccol  lector,  1,1);  /*  GET  THE  COLLECTING  NODE'S  TID*/ 
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if (me  ==  0)  /*  IF  I  AM  THE  COLLECTING  NODE:  */ 

{ 

for(i=0;  i<  max;  ++i) 

{ 

pvm_recv(  -1,  msgtag3); 

pvm_upkint(S<sat_no,  1,  1)  ;/*  RECEIVE  RESULT  SETS  */ 
pvm_upkint (&r_length,  1,  1); 
pvm_upkdouble (results,  r_length,  1)  ; 

} 

msgtag=5;  /*  TELL  MASTER  ALL  RESULTS  HAVE  BEEN  received  */ 

pvnt_initsend(  PvmDataRaw) ; 
pvm_send (master,  msgtag); 

} 

else  /*  If  I  AM  A  WORKING  NODE:  */ 

{ 

while (sats>0 )  /*  REPEAT  UNTIL  MASTER  SENDS  DONE  SIGNAL  */ 

{  pvm_recv(-l,  msgtag2) ; 
pvm_upkint (&sats ,  1,  1); 
for  (i=0;  i<sats;  +  +  i) 

{  pvm_upkdouble (sat_data,  num_elements  ,  1); 
sat_no= ( int ) sat_data [ 1 ] ; 

gettimeofday (&ts [2 ], (struct  timeval  *)0); 
sgp4m_  (sat_data,  results);  /*  CALL  SUB-ROUTINE*/ 
gettimeofday (&ts[3] , (struct  timeval  *)0); 
s=s+ts[3] .tv_sec-ts (2] .tv_sec; 
u=u+ts[3] . tv_usec-ts [2] . tv^usec ; 

r_length=7* (int) results [0]+l;  /*  NUMBER  OF  RESULTS  RECORDS  */ 
pvm_initsend (PvmDataRaw) ; 

pvm_pkint (  isat.no,  1,  1  );  /*  SATELLITE  NUMBER  */ 
p.^_pkint (  &r_length,  1,  1); 

pvm_pkdouble (  results,  r_length,  1  ) ;  /*PACK  */ 
pvm_send( col lector,  msgtag3);  /*  SEND  */ 

pvm_initsend ( PvmDataRaw) ;  /*TELL  MASTER  I'M  READY  FOR  MORE  DATA  */ 
pvm_pkint ( fcme ,1,1); 
pvm_send (master ,  msgtag99); 

) 

}/*  TIMING  STATISTICS  TO  BE  SENT  TO  MASTER  */ 
gettimeofday (tts [ 1 ], (struct  timeval  *)0); 
totaltime= (float ) ( ts [ 1 ] . tv_sec-ts ( 0 ] . tv_sec) *1000000+ 

(float ) ts [ 1 ] . tv_usec- (float ) ts [ 0 ] . tv_usec ; 
calc  =  s*1000000  +  u; 
comm  =  totaltime  -  calc; 
msgtag=4 ; 

pvm_initsend ( PvmDataRaw) ; 
pvm_pkint (&me,  1,1); 
if (me  ==  0) 

{ 

pvm_pklong (ttotaltime, 1,1); 

) 

else 

( 

pvm_pklong  (icalc,  1,1)  ;pvm_pklong  (Secomm,  1,1); 

) 

pvm_s end (mas ter, msgtag) ; 
pvm_exit ( ) ; 
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LAST  UPDATE: 


*  sat_master_SDI.c  LAST  UPDATE:  Oct  12  1993  * 

*  LT  S.K.  BREWER  * 

*  This  is  the  master  program  for  the  Successive  Deal  Method  I.  * 

*  It  uses  PVM  to  simulate  a  2D  torus  of  processors ;n+l  slaves  * 

*  are  spawned,  of  which  n  are  working  nodes  and  1  is  the  * 

*  collecting  node.  Satellite  data  is  issued  to  the  workers  by  * 

*  first  dealing  one  data  package  (num_satdata)  to  each  worker,  * 

•then  deal  l/(2*working  nodes) times  the  number  of  data  sets  * 

*  left (num_sets)  .Followed  by  a  final  deal  of  equal  packets  to  * 

•each  worker.  Any  leftover  records  are  sent  last.  Timing  data,  * 
•collectina  for  statistical  purposes  only,  are  placed  in  the  * 

*  file  *.*ming*  which  will  be  placed  in  the  directory  from  which  * 

*  this  master  program  is  invoked.  * 


♦include  <stdio.h> 
♦include  "pvmS.h" 
♦include  <sys/time.h> 
♦include  <time.h> 
♦include  <math.h> 
♦include  <sys/types .h> 


/*  INCLUDE  STANDARD  I/O  FUNCTIONS 
/*  INCLUDE  PVM  FUNCTIONS 


♦define  SLAVENAME  "t.run* 

int  main (argc,  argv)  /*  GET  FILE  NAME  FROM  COMMAND  LINE  */ 

int  arge ; 

char  *argv [ ] ; 

{ 

int  num_nodes ;  /*  NUMBER  OF  SLAVE  NODES  */ 

int  num_satdata ;  /*  NUMBER  OF  input  data  records*/ 

int  num_elements=22 ;  /*  NUMBER  OF  elements  */ 

double  sat  [10000]  [22J ;  / *  ARRAY  */ 

int  its, nod, size, delta=5; 

int  num,  mytid,  i=0,  j=0,  k=0,  s=0,  tids[32),  msgtag; 

int  numsat=0,  collector,  reading=l; 

int  leftover=0,  setsleft=0,  worker  =  0,  sets=0 , num_sets=0 ; 

int  work_nodes=0 , done=0; 

struct  timeval  ts[4];  /*  Time  Stamps  required  */ 

int  who; 

float  endtoend=0 . 0 , tcomm=0 . 0 , average=0 . 0 , avcoll=0 . 0 ; 

float  avcomm=0 . 0 , avcalc=0 . 0 , c_comm, avpcm=0 . 0 , avpcl=0 . 0, aa=0 . 0 

float  emtime,  commtime,  cctime,  calctime,  readtime; 

FILE  *infile,  ‘timing; 


/*  BEGIN  READING  DATA  FILE  */ 
gettimeof day (&ts [0 ], (struct  timeval  *)0); 

/*  OPEN  DATA  FILE  */ 

if  ((infile  =  fopen(argv[l] ,  "r"))  ==  NULL) 

{  printf (‘infile  =  %s  did  net  open\n",  argv[l]); 
exit (1) ; 

) 
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/*  READ  ENTIRE  DATA  FILE  AT  ONCE  */ 
while (reading  !=  EOF) 

{if  ((reading  =  f  scanf  ( inf  ile,  "%lf",  &sat  [numsat]  [0] )  )  1=  EOF) 
for  (j=l;  j<num_elements ;  ++j) 

fscanf (inf ile,  "%lf",  &sat [numsat ] [j ]) ; 

numsat=numsat+l;  /*  NUMBER  OF  SATELLITES  IN  DATA  FILE  */ 

) 

f close ( inf ile) ; 
numsat =numsat - 1 ; 

/*  END  READING  DATA  FILE  */ 
gettimeofday (its [ 1] , (struct  timeval  *)0); 

/*  SET  UP  FILE  FOR  TIMING  STATISTICS  */ 
timing  =  f open ( "timing" ,  "a"); 

readtime  =  (ts [1] .tv_sec-ts [0] .tv_sec) *1000000+ 
ts [ 1] . tv_usec-ts [0] . tv_usec; 

fprintf (timing, "Time  to  read  data  file  =  %ld  microseconds\n* , readtime) 
for(size=0;  size<55;  size  +=delta) 

{ 

num_satdata  =  size  +  5; 
for(nod=0;  nod<3;  ++nod) 

{ 

if (nod  ==  0) 
num_nodes  =  3 ; 
else 

if (nod  ==  1) 
num_nodes  =  7 ; 
else  num_nodes  =  15; 
for(its=0;  its<10;  ++its) 

{ 

lef tover=0 ; 
setslef t=0 ; 
sets=0 ; 
num_sets=0 ; 

gettimeofday (&ts [2] ,  (struct  timeval  *)0);/*  BEGIN  END  TO  END  TIME*/ 
/**********  ENROLL  IN  PVM  ***********/ 
mytid  =  pvm_mytid(); 

/*  START  UP  SLAVE  TASKS  */ 

num=pvm_spawn ( SLAVENAME ,  (char**)0,  0,  ",  num_nodes,  tids) ; 

collector=tids [ 0 ] ; 

/*  SEND  SLAVES  THIER  INDICES  INTO  THE  TID  ARRAY  */ 
msgtag=l ; 

for  (i=0;  i<num_nodes ;  ++i) 

{  pvm_initsend (PvmDataRaw) ; 
pvm_pkint (&i, 1, 1) ; 
if  ( i==0 ) 

pvm_pkint (inumsat,  1,  1); 

else 

pvm_pkint (&collector,  1,  1); 
pvm_send(  tids(i],  msgtag); 
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/*  SEND  SETS  OF  SATELLITE  DATA  TO  WORKERS  */ 
msgtag=2 ; 
k=0 ; 

work_nodes=num_nodes-l ; 
sets=numsat/num_satdata; 
lef tover=numsat-sets*num_satdata; 
i=0 ; 

for ( j=l; j<num_nodes;  ++ j )  /*  DEAL  SET  OF  SATS  TO  EACH  WORKER  */ 
{  pvm_initsend(PvmDataRaw) ; 

pvm_pkint  ( &num_satdata ,  1,1); 
for  (k=0;  k<num_satdata;  ++k) 

{  pvm_pkdouble ( sat [ i] ,  num_elements ,  1)  ; 
i=i+l ; 

) 

pvrn_send(tids  [  j  ] ,  msgtag)  ; 

) 

sets=sets-work_nodes ; 
num_sets=sets/ (2*work_nodes ) ; 

for(j=l;  j<num_nodes ;  ++ j )  /*  Deal  l/2p  records  */ 

{ 

for(s=0;  s<num_sets;  ++s) 

{ 

pvm_initsend(PvmDataRaw) ; 
pvm _pkint (&num_satdata, 1,1) ; 
for  ( k=  0 ;  k<num_satdata ;  +  +  k) 

{ 

pvm_pkdouble  ( sat  [  i  ] ,  num_elenients ,  1 )  ; 
i  =  i  + 1  ; 

) 

pvm_send ( t ids [ j ] , msgtag ) ; 

) 

) 

sets=sets- (num_sets*work_nodes ) ; 

num_sets=sets/work_nodes ; 

setslef t=sets- (num_sets*work_nodes ) ; 

/*  Deal  remaining  records  in  equal  packets  */ 
for (j si;  j<num_nodes;  ++j) 

{ 

for(s=0;  s<num_sets;  ++s) 

{ 

pvm_initsend ( PvmDataRaw) ; 
pvm_pkint  (£cnum_satdata,  1,1)  ; 
for  { k=0 ;  k<nvim_satdata;  +  +  k) 

{ 

pvm_pkdouble ( sat [ i ] ,  num_elements, 1 ) ; 
i=i+l ; 

) 

pvm_send ( t ids [ j ] , msgtag ) ; 


if (setsleft>0)  /*send  leftover  sets*/ 

{ 

for(s=0;  s<setsleft;  ++s) 

{ 

pvm_initsend(PvmDataRaw) ; 
pvm_pkint (ficnum_satdata,  1,1); 
for  (k=0;  k<num_satdata;  ++k) 

{ 

pvm_pkdouble ( sat [  i  ] ,  num_elements ,  1 )  ; 
i+1  ; 

) 

pvm_send (tids [1] , msgtag) ; 

) 

) 

if (leftover>0)  /*  send  leftover  records*/ 

{ 

pvm_initsend(PvmDataRaw) ; 
pvnupkint  (^leftover,  1,1); 
for  ( j  =0 ;  jcleftover; 

{ 

pvm_pkdouble (sat [i] ,  num_elements , 1) ; 
i=i+l ; 

} 

pvm_send(tids [1] .msgtag) ; 

) 

pvm_initsend(PvmDataRaw) ; 

pvm_pkint ( &done ,  1,  1);  /*  TELL  WORKERS  NO  MORE  DATA  IS  COMING*/ 

pvm_mcast (tids, num_nodes,  msgtag) ; 

msgtag=5;/*  RECEIVE  PROGRAM  COMPLETE  SIGNAL  FROM  COLLECTOR  */ 
pvm_recv ( - 1 , msgtag ) ; 

gettimeofday (&ts ( 3 ], (struct  timeval  *)0);  /*  END  TO  END  TIME*/ 

/*  GATHER  TIMING  STATISTICS  FROM  SLAVES  */ 
msgtag=4 ; 

for  (i=0;  i<num_nodes;  ++i) 

{  pvm_recv ( - 1 , msgtag ) ; 

pvm_upkint ( twho, 1,1); 

if  (who  ==  0)  /*  TIMES  FROM  COLLECTOR  */ 

{ 

pvm_upklong (&c_comm, 1 , 1 ) ; /*  TIME  COLLECTOR  COMM  */ 

) 

else  /*  TIMES  FROM  WORKERS  */ 

{ 

pvm_upklong (&cct ime ,1,1); 
calctime=calctime+cct ime ; 
pvm_upklong ( &cmt ime ,1,1); 
commt  ime=commt  ime+cmt  ime ; 

) 

) 

pvm_exit ( ) ; 
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/*  COMPUTE  OVERALL  TIMING  STATISTICS  */ 


/♦COMM  TIME*/ 

endtoend= (float ) (ts {3 ] . tv_sec-ts [2] . tv_sec) *1000000+ 

(f loat ) ts [3 ] • tv_usec- (float ) ts [2 ] . tv_usec; 

/♦convert  to  seconds*/ 
c_comm=c_comm/ 1 . 0E6 ; 
endtoend=endtoend/l .0E6; 
commtime=commtime/l . 0E6 ; 
calctime=calctime/l . 0E6 ; 

/*  TOTAL  TIME*/ 
average  =  average  +  endtoend; 

avcoll  =  avcoll  +  c_comm;  /‘collector  communication  time*/ 
avcomm  =  avcomm  +  commtime;  /‘worker  communication  time*/ 
avcalc  =  avcalc  +  calctime;  /*worker  calculation  time*/ 
endtoend  =  0.0; calctime  =  0.0; commtime  =  0.0;c_comm  =  0.0; 

} 

average  =  average/ its; 
avcoll  =  avcoll/its; 
avcomm  =  avcomm/ its; 
avcalc  =  avcalc/ its; 
avpcm=avcomm/ (num_nodes-l ) ; 
avpcl=avcalc/ (num_nodes-l ) ; 
aa= (avpcm/avpcl ) *100; 

/*  Print  results  to  output  file  -  not  shown  in  this  code  */ 

average=0 . 0 ; 

avcoll=0 . 0 ; 

avco..jn=  0.0; 

avcalc=0 . 0 ; 

avi  ^--0.0; 

avpcl  =  0 . 0  ; 

aa=0 . 0  ; 

) 


) 


} 


fclose (timing) ; 

printf ( ‘ENTIRE  SEQUENCE  COMPLETE  -  results  have  been  appended  to  timing") 


/Ik*********************************************************** 
*  * 

*  sat_slave_SDI.c  LAST  UPDATE:  12  OCT  1993  * 

*  ~  LT  S.K.  BREWER  * 

*  This  is  the  slave  program  for  Successive  Deal  I.  * 

*  It  uses  PVM  to  simulate  a  2D  torus  of  processors.  * 

*  The  slave  with  index  0  will  be  the  collecting  node.  * 

*  The  Fortran  sub-routine  *sgp4m*  is  called  to  perform  * 

*  the  calculations  for  orbit  prediction.  * 

♦in*********************************************************/ 


♦include  “pvm3.h*  /*  INCLUDE  PVM  FUNCTIONS  */ 

#include  <stdio.h> 

#include  <sys/time.h> 

#include  <time.h> 

♦include  <math.h> 

♦include  <sys/types .h> 


main ( ) 

{ 


double 

results [7*100+1] ;  /*  ARRAY  OF  RESULTS  */ 

int 

num_elements=22 

;  /*  NUMBER  OF  FIELDS  */ 

double 

sat_data[22] ; 

/*  ONE  SATELLITE  INPUT 

RECORD 

*/ 

int 

sats=l,maxsats; 

int 

sat_.no ; 

int 

i,j,  k,  t,  r_length;  /*  COUNTERS 

*/ 

int 

tids [32 ] ; 

/*  ARRAY  OF  PROCESSOR 

IDS 

*/ 

int 

mytid,  numnode; 

/*  MY  PROCESSOR  ID 

*/ 

int 

me,  collector; 

/♦MY  INDEX  INTO  THE  TIDS 

ARRAY 

*/ 

int 

master, msgtag, 

msgtag2=2,  msgtag3=3; 

struct 

timeval  ts[4]; 

float 

s=0.0,  u=0 . 0 ,  totaltime,  calc,  comm; 

extern 

sgp4m_  ();  /* 

TERNAL  SUB -ROUTINE  */ 

mytid 

=  pvm_my t id ( ) ; 

/*  ENROLL  IN  PVM  */ 

mas ter=pvm_pa rent ( ) ; 

/*  RECEIVE  MY  INDEX  AND  COLLECTOR'S  TID  FROM  MASTER  * / 

gettimeofday (its [ 0 ] , (struct  timeval  * ) 0 ) ; 
msgtag  =  1 ; 

pvm_recv(  -1,  msgtag  ); 

pvm_upkint (&me,  1,  1);  /‘GET  MY  INDEX  IN  THE  ARRAY  OF  TIDs*/ 
pvm_upkint (&col lector ,  1,  1); 
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if (me  ==  0)  /*  IF  I  AM  THE  COLLECTING  NODE  */ 

{ 

maxsat  s=col lector ; 

for(i=0;  i<maxsats;  ++i) 

{ 

pvn\_recv(  -1,  msgtag3); 

pvm_upkint(&sat_no,  1,  1);/*  RECEIVE  RESULT  Sets  */ 
pvm_upkint (&r_length,  1,  1); 
pvn\_upkdouble (results,  r_length,  1); 

} 

msgtag=5;  /*  TELL  MASTER  ALL  RESULTS  HAVE  BEEN  received  */ 
pvm_initsend(PvmDataRaw) ; 
pvm_send (master,  msgtag) ; 

) 

else  /*  If  I  AM  A  WORKING  NODE  */ 

( 

while (sats>0 )  /*  REPEAT  UNTIL  MASTER  SENDS  DONE  SIGNAL  */ 
(pvm_recv(-l,  msgtag2); 
pvm_upkint (&sats ,  1,  1); 

for  (i=0;  i<sats;  +  +  i) 

(  pvm_upkdouble (sat_data,  num_elements  ,  1); 
sat_no= ( int ) sat_data [ 1 ] ; 

gettimeofday (&ts [2] , (struct  timeval  *)0); 

sgp4m_  (sat_data,  results) ;  /*  CALL  SUB-ROUTINE  */ 

gettimeofday (its [3] , (struct  timeval  *)0); 

s=s+ts[3] .tv_sec-ts [2] .tv_sec; 

u=u+ts [3] .tv_usec-ts (2] .tv_usec; 

r_length=7* ( int ) results  t0]+l; 

pvm_initsend(PvmDataRaw) ; 

pvm_pkint (  &sa t_no,  1,  1  };  /*  SATELLITE  NUMBER*/ 

pvm_pkint (  &r_length,  1,  1); 

pvm_pkdouble (  results,  r_length,  1  );  /‘PACK  */ 
pvm_send (collector,  msgtag3);  /*  SEND  */ 

) 

) 

)/*  TIMING  STATISTICS  TO  BE  SENT  TO  MASTER  */ 
gettimeofday (&ts[l] , (struct  timeval  *)0); 
totaltime=( float) <ts(l] . tv_sec-ts [ 0 ] ,tv_sec) *1000000+ 

(float ) ts ( 1 ] . tv_usec- (float ) ts [ 0] . tv_usec; 
calc  =  s*1000000  +  u; 
comm  =  totaltime  -  calc; 
msgtag=4 ; 

pvm_initsend(PvmDataRaw) ; 
pvm_pkint (tme,  1,1); 
if (me  ==  0) 

{ 

pvm_pklong  (Ettotaltime,  1,1); 

} 

else 

{ 

pvm_p;-long (&calc,  1,1);  pvm_pklong (Jtcomm,  1,1); 

) 

pvm_send  (master, msgtag)  ;pvn\_exit  ( )  ; 

) 
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/*******************************************v  r  *****  * 

*  * 

*  sat_master_SDII.c  LAST  UPDATE:  Oct  13  1993  * 

*  LT  S.K.  BREWER  * 

*  This  is  the  roaster  program  for  the  Successive  * 

*  Deal  II.  It  uses  PVM  to  simulate  a  2D  torus  of  * 

*  processors;  n+1  slaves  are  spawned,  of  which  n  * 

*  are  working  nodes  and  1  is  the  collecting  node.  * 

*  Satellite  data  is  issued  to  the  workers  by  * 

*  constantly  dealing  out  equal  size  data  packs.  * 

*  Timing  data,  collecting  for  statistical  purposes* 

*  are  placed  in  the  file  “timrr*  which  will  be  * 

*  placed  in  the  directory  from  which  this  master  * 

*  program  is  invoked.  * 

****•»»*»****************************************»*/ 

#include  <stdio.h>  /*  INCLUDE  STANDARD  I/O  FUNCTIONS  */ 

# include  -pvmS.h*  /*  INCLUDE  PVM  FUNCTIONS  */ 

♦include  <sys/time.h> 

♦include  <time.h> 

♦include  <math.h> 

♦include  <sys/types .h> 

♦define  SLAVENAME  "t.run* 

int  main (argc,  argv)  /*  GET  FILE  NAME  FROM  COMMAND  LINE  */ 
int  argc ; 

char  *argv  [  ]  ; 

( 

int  num_nodes ;  /*  NUMBER  OF  SLAVE  NODES  */ 

int  num_satdata;  /*  ♦  input  records  dealt  */ 

int  num_elements=22 ; 

double  sat[10000] [22] ;  / *  ARRAY  */ 

int  its, nod, size, delta=5; 

int  num,  mytid,  i=0,  j,  k,  tids[32],  msgtag; 

int  numsat=0,  collector,  leftover,  worker,  sets; 

int  work_nodes,  done=0 , reading=l ; 

struct  timeval  ts[4];  /*  Number  of  time  stamps  *  / 

int  who; 

float  endtoend, tcomm, average=0 . 0 , avcoll=0 . 0 ; 

float  avcomm=0 . 0 , a'  alc=0.0,  readtime,  c_comm, avpcm=0 . 

float  cmtime, commtime, cctime, calctime, avpcl=0 . 0, aa=0 . 0 

FILE  *infile,  'timing; 

/*  BEGIN  READING  DATA  FILE  */ 
gettimeofday (&ts [0] , (struct  timeval  *)0); 

/*  OPEN  DATA  FILE  */ 

if  ((infile  =  fopen (argv( 1] ,  *r‘))  ==  NULL) 

{  printf ( • inf ile  =  %s  did  not  open\n",  argv[l]); 

exit (1) ; 

> 


66 


/*  READ  ENTIRE  DATA  FILE  AT  ONCE  */ 
while (reading  !  EOF) 

{  if  ((reading  =  fscanf (infile,  ■%lf‘,  &sat [numsat ][0 ] ) )  !=  EOF) 
for  (j=l;  j<num_elements ;  +  +  j) 
fscanf (inf ile,  *%lf*,  &sat [numsat] [j ]) ; 

numsat =numsat+l ;  /*  COUNT  NUMBER  OF  SATELLITES  IN  DATA  FILE  */ 

) 

fclose ( infile) ; 
numsat =numsat-l ; 

/*  END  READING  DATA  FILE  */ 
gettimeofday (&ts [1] , (struct  timeval  *)0); 

/*  SET  UP  FILE  FOR  TIMING  STATISTICS  */ 
timing  =  fopen ( "timrr* ,  *a'); 
readtime  =  (ts [ 1] . tv_sec-ts [0J .tv_sec) *3 000000+ 
ts ( 1] . tv_usec-ts [ 0] . tv_usec; 

for(size=0;  size<55;  size  +=delta) 

( 

num_satdata  =  size  +  5; 

for(nod=0;  nod<3;  ++nod) 

( 

if (nod  ==  0) 
num_nodes  =  3 ; 
else 

if (nod  ==  1) 

num_nodes  =  7 ; 
else  num_nodes  =  15; 

for(its=0;  its<10;  ++its) 

{  /*  BEGIN  END  TO  END  TIME  */ 
gettimeofday (&ts [2] ,  (struct  timeval  *)0); 

/***••••••*  ENROLL  IN  PVM  ***********/ 

mytid  =  pvm_mytid(); 

/*  START  UP  SLAVE  TASKS  */ 

num=pvm_spawn (SLAVENAME,  (char**)0,  0,  num_nodes,  tids); 

collector=tids [ 0 ] ; 

/*  SEND  SLAVES  THIER  INDICES  INTO  THE  TID  ARRAY  */ 
msgtag=l ; 

for  (i=0;  i<num_nodes;  ++i) 

{  pvm_initsend ( PvmDataRaw) ; 
pvm_pkint  (tci ,  1 , 1 )  ; 
if  ( i==0 ) 

pvm_pkint (inumsat ,  1,  1); 

else 

pvm_pkint  (i<col lector ,  1,  1); 
pvm_send (  tidsfi],  msgtag); 
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/*  SEND  SETS  OF  SATELLITE  DATA  TO  WORKERS  */ 
msgtag=2 ; 
k=0 ; 

work_nodes=num_nodes - 1 ; 

set  s=numsat  /num_satdata  ; 

leftover=numsat-sets*num_satdata; 

for  (i=0;  i<sets;  ++i) 

{  worker  =  i- ( i/work_nodes ) *work_nodes+l ; 

pvm_initsend(PvmDataRaw) ; 
pvm_pkint (&num_satdata,  1,  1); 
for(j=0;  j<num_satdata;  ++ j ) 

{  pvm_pkdouble (sat [k] ,  num_elements,  1); 
k=k+l ; 

} 

pvm_send(  tids [worker ] ,  msgtag) ; 

} 

if  ( lef tover>0 )  /*  SEND  LEFTOVERS  */ 

{  pvm_initsend(PvmDataRaw) ; 
pvm_pkint  (Stleftover,  1,  1); 
for(j=0;  j<leftover;  ++j) 

{  pvm_pkdouble (sat [k] ,  num_elements,  1) ; 
k=k+l ; 

} 

pvm_send ( t ids [work_nodes ] ,  msgtag ) ; 

} 

pvm_initsend ( PvmDataRaw) ; 

/*  TELL  WORKERS  NO  MORE  DATA  IS  COMING  */ 
pvm_pkint  (itdone,  1,  1); 
for(j=l;  j<  num_nodes;  ++j) 

{ 

pvm_send (tids [ j ] ,  msgtag); 

) 

msgtag=5 ; / *  RECEIVE  PROGRAM  COMPLETE  SIGNAL  FROM  COLLECTOR*/ 
pvm_r ecv ( - 1 , msgtag ) ; 

/*  COMPLETE  END  TO  END  TIME  */ 

gettimeofday (&ts ( 3 ], (struct  timeval  *)0); 

/*  GATHER  TIMING  STATISTICS  FROM  SLAVES  */ 
msgtag =4 ; 

for  (i=0;  i<num_nodes;  ++i) 

{  pvm_recv ( - 1 , msgtag ) ; 
pvm_upkint  (£twho,  1,1); 
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if  (who  ==  0)  /*  TIMES  FROM  COLLECTOR  */ 

{ 

pvm_upklong  (ScC_comm,  1,1);  /*  TIME  COLLECTOR  SPENT  COMMUNICATING  */ 

else 

/*  TIMES  FROM  WORKERS  */ 

{  pvm_upklong ( &cct ime ,1,1);  /*  TIME  SPENT  CALCULATING  RESULTS  */ 
calctime=calctime+cctime; 

pvm_upklong (&cmtime,  1, 1)  ;  /*  TIME  SPENT  COMMUNICATING  OR  WAITING  */ 

commtime=commtime+cmt  ime; 

) 

} 

pvm_exit ( ) ; 

/*  COMPUTE  OVERALL  TIMING  STATISTICS  */ 

/•COMM  TIME*/ 

endtoend= (float ) ( ts ( 3 ) . tv_sec-ts [2] . tv_sec) *1000000+ 

(float ) ts [ 3 ] . tv_usec- ( f loat ) ts [2 ] . tv_usec ; 

/•convert  to  seconds*/ 
c_comm=c_comm/ 1 . 0E6 ; 
endtoend=endtoend/ 1 . 0E6 ; 
conant  ime=coirat)time/  1.0E6; 
calc time=calct ime /I . 0E6 ; 

/*  TOTAL  TIME*/ 

average  =  average  +  endtoend; 

avcoll  =  avcoll  +  c_comm;  /‘collector  communication  time*/ 

avcomm  =  avcomm  +  commtime;  /‘worker  communication  time*/ 

avcalc  =  avcalc  +  calctime;  /‘worker  calculation  time*/ 

endtoend  =  0.0; calctime  =  0.0; commtime  =  0.0;c_comm  =  0.0; 

average  =  average/ its; 
avcoll  =  avcoll/its; 
avcomm  =  avcomm/ its; 
avcalc  =  avcalc/its; 
avpcm=avcomm/ (num_nodes-l ) ; 
avpc 1 =avca 1 c / ( num_node  s - 1 ) ; 
aa= (avpcm/avpcl ) *100 ; 

/*  Print  statistics  to  output  file  -  not  shown  in  code  */ 

average =0 . 0 ; 

avcoll=0 . 0 ; 

avcomm=0 . 0 ; 

avcalc=0 . 0 ; 

avpcm=0 . 0 ; 

avpc 1=0 . 0 ; 

aa=0 . 0 ; 


f close (timing) ; 

print f (‘ENTIRE  SEQUENCE  COMPLETE  ’); 

} 
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/★★★★★★★★★★♦♦★★★★★★★★★★★★★♦a******************************** 

★  * 

*  sat_slave_SDII.c  LAST  UPDATE:  13  OCT  1993  * 

*  LT  S.K.  BREWER  * 

*  This  is  the  slave  program  for  Successive  Deal  I.  * 

*  It  uses  PVM  to  simulate  a  2D  torus  of  processors.  * 

*  The  slave  with  index  0  will  be  the  collecting  node.  * 

*  The  Fortran  sub-routine  *sgp4m*  is  called  to  perform  * 

*  the  calculations  for  orbit  prediction.  * 

***********************************************************/ 
♦include  *pvm3.h*  /*  INCLUDE  PVM  FUNCTIONS  */ 

♦include  <stdio.h> 

♦include  <sys/time.h> 

♦include  <time.h> 

♦include  <math.h> 

♦include  <sys/types .h> 


main ( ) 

{ 

double  results [7*100+1] ;  /*  ARRAY  OF  RESULTS  */ 

int  num_elements=22 ;  /*  NUMBER  OF  FIELDS  */ 

double  sat_data [22 ] ;  /*  ONE  SATELLITE  INPUT  RECORD*/ 

int  sats-i,maxsats; 

int  sat_no; 

int  i,j,  k,  t,  r_length;  /*  COUNTERS  */ 

int  t ids [32];  /*  ARRAY  OF  PROCESSOR  IDS  */ 

int  mytid,  numnode;  /*  MY  PROCESSOR  ID  */ 

int  me,  collector;  /*  MY  INDEX  INTO  THE  TIDS  ARRAY  */ 

int  master, msgtag,  msgtag2=2,  msgtag3=3; 

struct  timeval  ts[4]; 

float  s=0.0,  u=0.0,  totaltime,  calc,  comm; 


extern  sgp4m_  ();  /*  EXTERNAL  SUB-ROUTINE  */ 

mytid  =  pvm_mytid();  /*  ENROLL  IN  PVM  */ 
master=pvm_parent { ) ; 


/*  RECEIVE  MY  INDEX  AND  COLLECTOR'S  TID  FROM  MASTER  */ 

gettimeofday (&ts [ 0 ], (struct  timeval  *)0); 
msgtag  =  1; 

pvm_recv(  -1,  msgtag  ); 

pvm_upkint ( &me ,  1,  1);  /*GET  MY  INDEX  IN  THE  ARRAY  OF  TIDs*/ 
pvm_upkint (&col lector ,  1,  1); 
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/*  IF  I  AM  THE  COLLECTING  NODE  */ 


if (me  ==  0) 

{ 

maxsats=collector; 

for(i=0;  i<maxsats;  ++i) 

{ 

pvm_recv(  -1,  msgtag3); 

pvm_upkint (Sesat_no,  1,  1);/*  RECEIVE  RESULT  Sets  */ 
pvm_upkint  (Ser_length,  1,  1); 
pvm_upkdouble (results,  r_length,  1); 

} 

msgtag=5;  /*  TELL  MASTER  ALL  RESULTS  HAVE  BEEN  received  */ 
pvm_initsend(PvmDataRaw) ; 
pvm_send (master,  msgtag) ; 

} 

else  /*  If  I  AM  A  WORKING  NODE  */ 

{ 

while (sats>0)  /*  REPEAT  UNTIL  MASTER  SENDS  DONE  SIGNAL  */ 
{pvm_recv(-l,  msgtag2); 
pvm_upkint  (Sesats,  1,  1); 
for  (i=0;  i<sats;  +  +i) 

{  pvm_upkdouble ( sat_data,  num_elements  ,1); 
sat_no= ( int ) sat_data [ 1 ] ; 

gettimeofday  (Sets  [2] ,  (struct  timeval  *)0); 

sgp4m_  (sat_data,  results);  /*  CALL  SUB-ROUTINE  */ 

gettimeofday (&ts [3 ], (struct  timeval  *)0); 

s=s+ts[3] . tv_sec-ts [2] .tv_sec; 

u=u+ts [3 ] . tv_usec-ts [2] . tv_usec; 

r_length=7* ( int ) results [0 ] +1 ; 

pvm_initsend(PvmDataRaw) ; 

pvm_pkint  (  Sesat_no,  1,  1  );  /*  SATELLITE  NUMBER*/ 

pvm_pkint  (  Ser_length,  1,  1)  ; 

pvm_pkdouble (  results,  r_length,  1  );  /*PACK  */ 
pvm_send (collector,  msgtag3);  /*  SEND  */ 

) 

) 

}/*  TIMING  STATISTICS  TO  BE  SENT  TO  MASTER  */ 

gettimeofday  (Sets  [1] ,  (struct  timeval  *)0); 

total time= (float ) (ts [ 1] . tv_sec-ts [ 0 ] . tv_sec) *1000000+ 

(float) ts[l] . tv_usec- ( float ) ts [ 0 ) .tv_usec; 
calc  =  s*1000000  +  u; 
comm  =  totaltime  -  calc; 
msgtag=4 ; 

pvm_initsend(PvmDataRaw) ; 
pvm_pkint  ( Seme ,  1,1); 
if (me  ==  0) 

{ 

pvm_pklong  (Setotaltime,  1,1); 

) 

else 

{ 

pvm_pklong  (Secalc,  1,1);  pvm_pklong  (Secomm,  1,1); 

} 

pvm_send (master, msgtag) ;pvm_exit ( ) ; 

) 
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*  * 


*  seq.c  LT  S.K.  BREWER  OCT  25  93  * 

•  * 

*  This  is  a  sequential  version  of  the  satellite  orbit  * 

*  prediction  program  using  SGP4 .  * 

*  * 


# include  <stdio.h>  /*  INCLUDE  STANDARD  I/O  FUNCTIONS*/ 

#include  <sys/time.h> 

♦include  <time.h> 

♦include  <math.h> 

♦include  <sys/types .h> 


int  main (argc,  argv)  /*  GET  FILE  NAME  FROM  COMMAND  LINE*/ 

int  argc; 

char  *argv [ ] ; 

{ 


int  iterations=50 ; 

int  num_elements=22 ; 

double  sat  £ 32 C 0 0 ] [22] ; /‘ARRAY  OF  SATELLITE  INPUT  DATA  */ 

int  its;  /*  NUMBER  OF  ITERATIONS  OF  THE  PROGRAM  */ 

int  i=0,  j,  k,  t,  reading=l; 

int  numsat=0; 

struct  timeval  ts[4];  /*  Number  of  Time  Stamps  Required*/ 

float  endtoend=0 . 0 , average=0 . 0 

long  readt ime ; 

int  sat_no; 

double  results [7*100+1] ; 

FILE  *infile,  ‘timing; 


extern  sgp4m_  ( ) ; 

/*  BEGIN  READING  DATA  FILE  */ 

gettimeofday (&ts [0] , (struct  timeval  *)0);  /*  OPEN  DATA  FILE*/ 
if  ((infile  =  fopen(argv[l] ,  *  r  ** )  )  ==  NULL) 

{  printf ( "infile  =  %s  did  not  open\n",  argv[l]); 
exit (1) ; 

} 


/*  READ  ENTIRE  DATA  FILE  AT  ONCE  */ 
while (reading  !=  EOF) 

{  if  ((reading  =  fscanf  (infile,  *%lf" ,  Stsat  [numsat]  [0]  )  )  !=  EOF) 
for  (j=l;  j<num_elements ;  ++j) 

fscanf ( inf ile,  *%lf*f  &sat [numsat] [j )) ; 
numsat=numsat4 1 ;  /*  COUNT  NUMBER  OF  SATELLITES  IN  DATA  FILE  */ 

} 

fclose(infile) ; 
numsat =numsat-l ; 

gettimeofday ( &ts [1] , (struct  timeval  *)0);  /*  END  READING  DATA  FILE  */ 
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/*  SET  UP  FILE  FOR  TIMING  STATISTICS  */ 
timing  =  f open ( "timing. seq* ,  "a"); 
readtime  =  ( t s [ 1 ] .tv_sec-ts[0] .tv_sec) *1000000+ 
ts[l] . tv_usec-ts [0] . tv_usec; 
for(its=0;  its<iterations;  ++its) 

{ 

gettimeofday (&ts [2 ], (struct  timeval  *)0); 
for  (i=0;  i<numsat;  ++i) 

{  sat_no= ( int ) sat [i] [1] ; 
sgp4m_  (sat[i],  results); 

) 

gettimeofday (&ts [3 ], (struct  timeval  *)0); 
endtoend= (float) (ts [3] .tv_sec-ts [2] .tv_sec) *1000000+ 

(float)ts[3] .tv_usec- (float)ts [2] .tv_usec; 

/*  convert  to  seconds  */ 
endtoend=endtoend/l . 0E6; 

/*  write  results  to  timing  output  file  */ 

fprintf (timing,  "\n  Endtoend  time  (sec)  =  %6 .2f \n" , endtoend) ; 

/*  Total  Time  */ 
average=average+endtoend ; 

) 

average=average/its ; 

fprintf (timing,  "\n  Average  Endtoend  time  (sec)=  %6.2f\n",  average) 
f close (timing) ; 

printf ( " \nENTIRE  SEQUENCE  COMPLETE  "); 

) 
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